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ABSTRACT 

The computational method of parametric probability analysis is introduced. It is demonstrated how to embed 
logical formulas from the propositional calculus into parametric probability networks, thereby enabling sound rea- 
soning about the probabilities of logical propositions. An alternative direct probability encoding scheme is presented, 
which allows statements of implication and quantification to be modeled directly as constraints on conditional prob- 
abilities. Several example problems are solved, from Johnson-Laird's aces to Smullyan's zombies. Many apparently 
challenging problems in logic turn out to be simple problems in algebra and computer science: systems of polyno- 
mial equations or linear optimization problems. This work extends the mathematical logic and parametric probability 
methods invented by George Boole. 

1 INTRODUCTION 

This essay introduces parametric probability analysis, a method to compute useful symbolic and numeric results 
from probability models that contain parameters treated algebraically as unknown variables. A convention is pro- 
vided to embed formulas from the propositional calculus into such parametric probability models, thereby enabling 
sound reasoning about the probabilities of logical propositions. An alternative scheme of direct probability encod- 
ing is presented, which allows statements of implication and consequence to be modeled directly as constraints on 
conditional probabilities (without intermediate formulas from the propositional calculus). With direct encoding, prob- 
abilities can be used to extend classical logical quantifiers into more precise proportional statements of quantification 
(or even arbitrary polynomial constraints involving such proportions). Several example problems are analyzed, using 
the Probability Query Language (PQL) computer program developed by the author. It turns out that many apparently 
challenging problems in logic and probability are in fact simple problems in algebra and computer science: systems 
of polynomial equations and inequalities, general search problems, polynomial fractional optimization problems, and 
in some cases just linear optimization problems. 

This work is a continuation of George Boole's pioneering formulation of mathematical logic and probability, codi- 
fied in his 1854 Laws of Thought Q. It complements the formalization of Hailperin 1 12 1 by parsing Boole's notation in 
a substantially different way that respects Boole's overloaded use of operator signs and numerals. Recognizing Boole's 
operator overloading, as did Venn many years ago [32], obviates the need to invoke unusual arithmetic or heaps that 
are not quite sets in order to explain Boole's calculations. There were several innovations in Boole's methodology: 
the representation of logical formulas and axioms as polynomial formulas and equations; a means to embed logical 
formulas within probability models; a database-and-query model of interaction; and two-phase inference, with a pri- 
mary phase of symbolic probability inference followed by a secondary phase of more general algebraic and numerical 
analysis. The computational method introduced here adds several features to extend Boole's original work: explicit 
probability-network models; structured probability queries; clearer semantics for embedding propositional-calculus 
formulas versus directly encoding implication with conditional probabilities; and broadened secondary analysis that 
includes search and general algebra as well as optimization. 
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Variable 


Role 


Description 


Domain 


P 


Primary 


Proposition: It is a bird 


{T,F} 


Q 


Primary 


Proposition: It can fly 


{T,F} 


R 


Primary 


Value of (P -> Q) 


{T,F} 


A 


Primary 


Value of (3) 


{3} 


B 


Primary 


Number true in {P, Q,R} 


{0,1,2,3} 


C 


Primary 


Value of (A — B) 


{0,1,2,3} 


X 


Parameter 


Probability that P is true 


[0,1] 


y 


Parameter 


Probability that Q is true if P is true 


[0,1] 


z 


Parameter 


Probability that Q is true if P is false 


[0,1] 



Table 1 Variables in the probability network describing a creature that might be a bird and might be able to fly. 



2 PARAMETRIC PROBABILITY NETWORKS 

2.1 The Problem with Penguins 

Let us begin with a problem that has vexed quite a few philosophers and computer scientists. How can logic be 
used to reason about an implication that is true sometimes but not always? Following artificial-intelligence tradition 
ESI we contemplate the problem that most birds can fly but some cannot. To model this problem we shall build 
a parametric probability network denoted ^#pq. This modeling formalism is built from the symbolic algebra used 
by de Moivre and Bernoulli in their foundational 18th-century treatises on probability E2l l2lk from the parametric 
treatment of probability developed by Boole in the 19th century |3|; from the axioms of probability theory provided 
by Kolmogorov in the early 20th century |PT8"1 : from the relational databases developed by Codd in the 1970s |8|; 
and from the Bayesian belief networks and influence diagrams developed in the 1980s by Pearl, Howard, and others 
ll27l[T3l . A parametric probability network has four parts: a set of variables, a set of constraints, a network graph, and 
a set of component probability tables. These parts can be described in a formal, structured language that is suitable for 
processing by computers as well as by humans. 

2.2 Variables and Constraints 

First let us introduce two primary variables to represent logical propositions about a hypothetical creature: P that 
it is a bird, and Q that it can fly. For this example each of these primary variables may be either true or false, 
abbreviated T and F. We would like to consider the truth value of the logical statement 'P implies Q and its relation 
to various probabilities involving P and Q. To facilitate this we add a third primary variable R defined as the value of 
the formula P — > Q, where the arrow denotes the usual 'if/then' material-implication operator of propositional logic 
ifTUl . This definition is denoted R := (P — s- Q), with the custom that the expression after the definition sign uses 
another mathematical system that is embedded within the probability model (in this case the propositional calculus). 
Following Nilsson, one way to describe the truth value of a logical proposition in the context of probability is to use 
the probability that the proposition is true l23l : let us call this 'fractional truth value'. Thus Pr (P = T) describes the 
fractional truth value of the atomic formula P; Pr (Q = T) describes the fractional truth value of the atomic formula 
Q; and Pr (R — T) describes the fractional truth value of the compound formula P — > Q. 

Next let us introduce three more variables, x, y, and z, to be used as parameters for specifying the probabilities of 
P and Q. Since parameters are treated differently from primary variables during analysis, we maintain a distinction 
between these two roles that a variable may play. In contrast to P, Q, and R, which share the domain {T, F} of two 
possible values, the parameters x, y, and z may take real-number values between zero and one: thus the domain of 
each is the interval [0, 1]. Finally let us add three more primary variables A, B, and C, with the declarations that B and 
C share the domain {0, 1,2,3} of four possible integer values and that A has the set {3} of a solitary possible value. 
We define A as the value of the number 3; B as the number of true propositions in the set {P, Q, R}; and C as the value 
of the difference A—B. Thus A := 3 and C := (A — B), now using integer arithmetic instead of propositional logic as 
the embedded mathematical system. Table [T] lists all the variables in the parametric probability network ^#pq. Using 
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Figure 1 Probability network graph of model ^#pq, with idiosyncratic graphical notation explained in Section 



2.3 



m for the number of primary variables and n for the number of parameters, this model has m = 6 and n = 3. 

This document uses the typographical conventions that primary variables are rendered as uppercase italic letters 
(A, P, Q, etc.) and parameters as lowercase italic letters (x, y, etc.); variable names are case-sensitive, so for example 
x and X are considered different variables. States of primary variables are rendered in sans-serif type (such as True, 
False, T, F, 0, 1, 2, 3). These practices are intended to distinguish numbers and formulas in embedded mathematical 
systems (such as 'P —> Q' which is a formula in the propositional calculus) from numbers and formulas in the host 
probability model (such as '1 — p + pq which is a formula in the algebra of polynomials with rational coefficients). 
Additionally, different symbols are used to disambiguate different meanings of the equal sign: the colon and equal 
sign : = for definition or assignment; the double right arrow =>■ for evaluation (as in 2 + 2 =>■ 4); and the standard equal 
sign = for the test or assertion of equality. For this presentation, all primary variables are discrete (with finite sets of 
possible values) and all parameters are continuous (taking rational or real-number values). But in general, parametric 
probability networks are allowed to have continuous primary variables and discrete parameters too. 

2.3 Network Graph and Component Probability Tables 

The graph in Figure [T] shows how the variables in the parametric probability network ^#pq are related to one another; 
it also dictates which component probability tables must be specified in order to complete the model. Following 
standard probability-network notation, each oval node in this directed acyclic graph represents a primary variable 
and each directed edge shows a correlation or functional dependence relationship. The absence of an edge is an 
assertion of independence. In the author's idiosyncratic notation parameters are included in the graph and drawn with 
parallelogram nodes; furthermore a clique (fully-connected subset of nodes) will be indicated by a small diamond and 
undirected edges (as illustrated in the examples in Section[7]i. Double borders on a node indicate that the corresponding 
variable is deterministic: for a primary variable this means that every component probability must be either or 1; for 
a parameter this means that its value must be fixed at some constant. 

The probability of each primary variable must be specified as a function of its parents in the network graph. Thus 
for the model ^#pq we must provide several component probabilities based on the graph in Figure[TJ We must specify 
the probability of the primary variable P as a function of its parent, the parameter x; and the conditional probability 
of Q given its primary-variable parent P as a function of its parameter parents y and z. These component probability 
distributions appear in Table|2]parts (a) and (b). We must also specify the conditional probability of R given its parents 
P and Q. For this we transcribe the truth table of the logical formula P — > Q into the conditional probability table 
shown as Table |2]part (c). The conditional probabilities for B given P, Q, and R, shown as Table |2]part (d), encode the 
number of true parent variables. The primary variable A has one possible state which is assigned probability one as 
shown in Table|2]part (e). Finally, for the primary variable C we specify as Table|2]part (f) a transcription of the table 
that gives the value of the arithmetical formula A — B using A = 3 and integer values of B between and 3. Component 
probabilities specified by the user are designated Pro (■■ •), with the subscript used to distinguish these input values 
from the output probabilities Pr (. . .) later computed from them. 
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(a) 



(d) 



p 




T 


X 


F 


1 -X 



(b) 



Pro(el^) 


p 


Q = T 


s = f 


T 


y 


l-.v 


F 


z 


l-z 



Pr (B\P,Q,R) 



P Q R 


B=0 B=l B=2 5=3 


T T T 


OOOl 


T T F 


10 


TFT 


10 


T F F 


10 


F T T 


10 


F T F 


10 


F F T 


10 


F F F 


10 



(c) 



(e) 



(f) 



Pro (R\P,Q) 



p Q 


R=T R=F 


T T 


1 


T F 


1 


F T 


1 


F F 


1 



A 


Pro (A) 


3 


1 



Pr (C|A,fi) 



A B 


C=0 C=l C=2 C=3 


3 


1 


3 1 


10 


3 2 


10 


3 3 


10 



Table 2 Component probability tables for the model .-#pq. The subscript in Ptq(---) identifies these as user input. 



In general the component probability table for a primary variable must contain an element for each of its possible 
states, given every unique combination of states of its primary-variable parents (an empty set of parents is considered 
to have one combination of states). Each of these component probabilities must be a polynomial function of the model 
parameters (with real or rational coefficients). The user may specify arbitrary polynomial equality and inequality 
constraints on the model parameters; the system adds constraints as needed to enforce the laws of probability (that 
the feasible values of each component probability must lie between zero and one, and that at every feasible point the 
probabilities of mutually exclusive and collectively exhaustive events must add up to one). Parameters do not get 
their own probability distributions; that is precisely how they differ from primary variables. Hence in ^#pq there are 
no component probability tables for the parameters x, y, and z. However parameters are always subject to algebraic 
constraints, in this case the zero-one bounds given in Table[TJ thus ^ x ^ 1, ^ y ^ 1, and ^ z ^ 1. 

2.4 Computable Specification 

Here is the computable specification of the parametric probability model ./#pq. The corresponding file was processed 
by the author's computer program to generate the figures and tables in this section. 

// basicl.pql: with propositional calculus and integer arithmetic embedded 

parameter x { label = "Probability that $P$ is true"; range = (0,1); } 

parameter y { label = "Probability that $Q$ is true if $P$ is true"; range = (0,1); } 

parameter z { label = "Probability that $Q$ is true if $P$ is false"; range = (0,1); } 

primary P { label = "Proposition: It is a bird"; states = binary; } 

probability ( P ) { data = (x, 1-x) ; noverify; } 

primary Q { label = "Proposition: It can fly"; states = binary; } 

probability ( Q I P ) { data = (y, 1-y, z, l-z); noverify; } 

primary R { label = "Value of $(P \rightarrow Q)$"; states = binary; } 
probability ( R I P Q ) { function = "R <-> P -> Q ? 1 : 0"; } 

primary A { label = "Value of $(3)$"; states = range( 3, 3 ); } 
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probability ( A ) { data = (1) ; } 

primary B { label = "Number true in $\{P,Q,R\}$" ; states = range ( 0, 3 ); } 
probability ( B I P Q R ) { function = "B == P + Q + R ? 1 : 0"; > 

primary C { label = "Value of $(A - B)$"; states = range( 0, 3 ); } 
probability ( C I A B ) { function = "C == A - B ? 1 : 0"; } 

net { graph = 'subgraph { rank=same; "P"; "Q"; }'; } // hint for graph drawing 

Like the Structured Query Language (SQL) for relational databases |9|, the author's Probability Query Language 
includes a data definition language for specifying models and a data manipulation language for asking queries. The 
PQL data definition language, which you see illustrated above, has syntax like the ubiquitous C programming language 
|[T6l and was also inspired by the NET modeling language from the Hugin system for Bayesian-network inference 
fl4l . The PQL data manipulation language, which is demonstrated below, uses keywords that the user may type as 
commands to an interactive shell or alternatively include in source files for a batch processor. The command-line 
shell (called pqlsh) and the batch processor (called pqlpp) are built atop the Tel scripting language [25 1. There is 
implicitly a structured language for the results of PQL queries as well; these results are essentially relational-database 
tables whose entries include natural numbers, character strings, and fractional polynomials with rational coefficients. 

3 EMBEDDED MATHEMATICAL SYSTEMS 

Probability can be used to reason about formulas that are governed by other mathematical systems such as the prepo- 
sitional calculus or integer arithmetic. Such mathematical formulas can be embedded within parametric probability 
networks using the method described here. There are three main aspects to embedding mathematical formulas in 
probability networks: assigning prior probabilities to the probability-network variables copied from the mathematical 
variables; assigning conditional probabilities to the probability-network variables created to represent the non-atomic 
mathematical formulas; and modeling unknown mathematical functions. 

3.1 Prior Probabilities for Prepositional Variables 

To begin the embedding process we copy the variables used in the mathematical formulas of interest into primary 
variables in the parametric probability network; then we assign these primary variables a prior probability distribution. 
Let us take the simple view that a mathematical formula is a finite-length string that may contain only constants, 
variables, operator signs, and parentheses; a set of grammatical rules dictates which strings constitute well-formed 
formulas. The constants must be members of some set K of elementary values, and each variable Pi, Pi, ■ ■ ■ ,P> ranges 
over values in this set K. When each variable in a formula is assigned a constant value from K, the formula itself must 
also have a value in K; thus we imagine the set K to be the domain of a mathematical structure that is closed under 
the allowed operations. For this discussion, we assume that each formula is finite in length and that the number 
of formulas under consideration is finite; hence the number v of primary variables is also finite. Let us further limit 
our attention to the case that the set K of elementary values is finite, with some size d; this is adequate for modeling 
'logical' systems with limited numbers of elementary truth values (for example the usual two). 

UNINFORMATIVE PRIORS, PARAMETRIC ALLY The simplest way to specify a prior probability distribution on the 
variables Pi through P v is to avoid independence assertions and to specify directly the joint probability distribution 
Pro (P\,P2, . . . ,P V ). Recall that the subscript in Pro (• • •) designates component probabilities input by the user, as 
opposed to computed probabilities Pr (• • •) output by the system. Anyway in this joint-prior case the corresponding 
probability-network graph has the variables Pi through P v joined as a clique. With d elementary values in the set K it is 
necessary to provide d v individual probabilities in order to specify completely the distribution Pro (Pi ,P2, . . . ,P V ). The 
laws of probability require that each probability must lie between zero and one, and that the sum of the probabilities 
in this joint distribution must be one (since they describe mutually exclusive and collectively exhaustive events). To 
provide a truly uninformative prior probability distribution, we should use parameters to state exactly these properties 
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and nothing more. For example, we might specify the respective probabilities as the variables Xi,X2, ■ ■ ■ ,x<p) subject to 
the constraints that each x, ^ 0, each x, • 1, and the sum Xj = 1. 

It is also possible to specify the joint distribution Pro (P\,P2, ■ ■ ■ ,P V ) indirectly by using a probability-network sub- 
graph with directed edges; if this subgraph is fully-connected then it still does not introduce independence assertions. 
For example we might specify the unconditioned probability distribution Pro (Pi ) and then the conditional probability 
distributions Pro {Pi \ Pi), Pro {P3 \ P\,P2), and so on until Pro (P v \ P\ : P2, ■ ■ ■ ,Pv-i)- De Moivre used exactly this con- 
struction centuries ago to describe the joint probability of several dependent events l22l . In general many different 
fully-connected network graphs are possible; each requires a particular set of probabilities to be supplied by the user. 

JUST ADD INFORMATION There are several ways to add information beyond uninformative parametric prior prob- 
abilities, should the user wish to do so: through the choice of values in the component probability tables; through 
explicit algebraic constraints on the parameters used to specify component probabilities; and through independence 
assertions (modeled as the absence of arcs in the probability-network graph) which indirectly provide algebraic 
constraints. For example, the user may desire to specify that the primary variables P\,P2,. . . ,P V are probabilis- 
tically independent: in this case they are not directly connected in the network graph, and the joint probability 
Pr (Xi = k\,X2 =k2,...,X v = k v ) that each variable X[ takes the elementary value ki is constrained to equal the product 
of the concordant individual component probabilities: Pro (X\ = k\) x Pro (X2 = £2) X • • • X Pro (X v = k v ). As another 
example, it may be desirable to constrain each prior probability to be strictly greater than zero, in order to specify 
that no combination of primary-variable values is considered impossible a priori. The tools of parametric probability, 
including graphical models and algebraic constraints, allow the user to say exactly what he or she means about prior 
probabilities. 

FOR THE BIRDS To illustrate copied mathematical variables and prior probabilities, in the ^#pq model there are 
two embedded formulas P — > Q and A — B. The first formula uses the propositional calculus, which includes the set 
{T, F} of elementary values representing truth and falsity as well as the operator — > for material implication (along 
with the usual operators A, V, -1, etc.). We copy the propositional variables P and Q into the probability network as 
primary variables, each with the domain {T, F}. In lieu of the joint distribution Pro (P,Q), for this model we specify 
the component probabilities indicated by the fully-connected subgraph with a directed arc from P to Q; hence the 
required component probabilities are Pro (P) and Pro (Q \ P) which appear in Table [2] The constraints on the real- 
valued parameters x, y, and z used in these component probability tables provide no more information than the laws of 
probability require. The value of each parameter must lie between and 1. Here algebra takes care of the sum-to-one 
constraints; for example in Pro (P) it is tautological that x + (1 — x) = 1. The second embedded formula A— B uses 
integer arithmetic, which includes set {0, 1,2,3} of elementary values (for convenience we focus on this finite subset 
of Z) and the operator — for subtraction (along with the usual operators +, x, etc.). The variables A and B are copied 
into the probability network as primary variables. For the definition A := 3 we add the prior information that the only 
possible state of A is 3; hence the simple component probability table in Table[2]part (e) assigns probability one to this 
event. The component probability table specified in Table |2]part (d) encodes that the state of B expresses the number 
of primary variables among P, Q, and R that have the state T. 

3.2 Conditional Probabilities for Operations 

To continue the embedding process we introduce additional primary variables for the compound formulas of interest. 
For each non-atomic formula 0, we introduce a new primary variable S 1 ,-. The input component probability table for 
Si must specify the conditional probability of 5, given the variables Pj t ,Pj 2 ,... ,P Ja used in the formula 0,-. This con- 
ditional probability table Pro (5; \Pj l ,Pj 2 ,- ■ -,Pj a ) must assign probability one to the appropriate value of the formula 
given each combination of values of its arguments. 

Returning to the ./#pq model, we introduce the primary variable R to represent the value of the compound logical 
formula P — > Q and we add the primary variable C for the compound arithmetical formula A — B. To complete the 
definition R := (P — > Q) the input component probability table Pro (R \ P, Q) assigns probability one to the appropriate 
elementary truth value of the statement of material implication, given each combination of truth values of its arguments. 
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The conditional probability table is derived in the obvious way from the related logical truth table: 



p 


Q 


P^Q 


T 


T 


T 


T 


F 


F 


F 


T 


T 


F 


F 


T 



Pr ((P^Q)\P,Q) 



p 


Q 


(P- 


^Q) = T (P^Q) = F 


T 


T 


1 





T 


F 





1 


F 


T 


1 





F 


F 


1 






(1) 



For example because the logical formula T —> F has the value F we have assigned the corresponding conditional 
probability Pr {(P -> Q) = F | P = T, Q = F) the value 1 and its complement Pr {{P -> Q) = T | P = 1, Q = F) the 
value 0. In Equation [T] the complete formula P — > Q instead of its abbreviation R appears in the heading of the table 
Pro ((P — >• Q) | P, Q); otherwise this component probability table is the same as Pro (R \ P, Q) which appears as Table |2] 
part (c). The other definition C := (A — B) in the .-#pq model is handled essentially the same way. We consider the 
portion of the mathematical function table for the arithmetical subtraction operator when its first argument is 3 and its 
second argument is a member of the set {0, 1,2,3}. The derived component probability table follows: 



A 


B 


A-B 


3 





3 


3 


1 


2 


3 


2 


1 


3 


3 






Pro ((A 


-B)\A,B) 








A B 


(A-B) = 


(A-B) = l 


(A-B) = 2 


(A-B) = 3 


3 











1 


3 1 








1 





3 2 





1 








3 3 


1 












(2) 



The same table appears as Table |2]part (f) labeled with the abbreviation C instead of the full embedded formula A — B. 



3.3 Known and Unknown Unknowns 

The number of elementary truth values in a logical system is independent of the idea of assigning a probability to 
each possible value (or the idea of considering sets of elementary values). For example in ordinary algebra, in order 
to express the idea that an integer is unknown, we do not imagine that there is an integer called 'unknown' in the set 
Z. Instead we introduce a symbolic variable and declare that it is constrained to take values in the set of integers, 
for example Y with Y € Z. We can expand this set-based declaration Y E Z into a parametric probability distribution 
by further specifying that there is some probability that the variable Y takes each integer value k in the set Z. In 
this context, to express perfect ignorance about the value of the integer-valued variable Y, we should admit only that 
each probability p^ takes a real value between zero and one and that all of the probabilities add up to one: hence we 
constrain each piel with ^ pk ^ 1 and the infinite sum Y,k Pk = ^- 

Returning to logic, let us consider an embedded system similar to the propositional calculus but with the set 
{True, False, Unknown} of three elementary truth values instead of the usual two. Now, in this 3-valued logic it would 
be a different thing to say that the value of some variable V is unknown than to say that the value of V is certainly 
the elementary value called Unknown. For the former assertion (the value of V is unknown) we should start with the 
set-based declaration V € {True, False, Unknown} and optionally expand this declaration into a parametric probability 
distribution with only the essential probability constraints, such as: 



V 


Pro(V) 


True 


X\ 


False 


X2 


Unknown 


JC 3 



(3) 



But for the latter assertion (Unknown is the value of V) we should assign probability one to the value named Unknown: 



V 


Pro(V) 


True 





False 





Unknown 


1 



(4) 
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Many multivalued logics confuse these two ideas (considering sets of possible elementary values — perhaps with prob- 
abilities attached — versus adding new elementary values). 

We can extend the courtesy of parametric representation to unknown/kncf/ons as well as to unknown variables. For 
example, returning to 2-valued logic, the following conditional probability table and constraints describe an unknown 
binary operation R* whose arguments P and Q take values in {T, F}: 



Pr («* 


P,Q) 


P Q 


R* = T 


R* 


= F 


T T 


h 


1- 


h 


T F 


H 


1- 


h 


F T 


h 


1- 




F F 


h 


1- 


u 



fiSR, ^-6 {0,1} 



(5) 



This parametric probability table encodes 2 4 
function R 



M01 1 



tional R 



the vector (0,0,0,0). 
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16 possible logical functions. For example, the material-implication 
= (P Q) corresponds to the vector (1,0, 1, 1) of values for the parameters (fi,?2,f3,f4); the bicondi- 
(P ^> Q) corresponds to the vector (1,0,0, 1); and the always-false function /?5ooo := ^ corresponds to 



4 PRIMARY ANALYSIS: SYMBOLIC PROBABILITY INFERENCE 



Having defined the parametric probability network ./#pq with embedded formulas from the propositional calculus 
and from integer arithmetic, let us now proceed with some analysis. We follow the ingenious framework laid out by 
Boole in his Laws of Thought [3 |, which included a database-and-query model of interaction between the user and 
the analytic system, as well as a two-phase model of inference. Boole considered a parametric probability model 
(the data in his terminology), to which a probability query could be posed (his quaisitum). In the first phase of 
analysis, a polynomial formula (Boole's final logical equation) was calculated to answer the query; the variables in 
this polynomial were the unknown parameters in the probability model. In the second phase of analysis, the minimum 
and maximum feasible values of this polynomial formula were computed (Boole's limits), subject to constraints that 
expressed the laws of probability. Parametric probability analysis follows Boole's two-phase model of inference: 
in the primary phase of analysis, parametric probability networks and structured probability queries are processed by 
symbolic probability-network inference to compute polynomial solutions; and in the secondary phase of analysis, these 
polynomial solutions are used for additional algebraic and numerical analysis. 



4.1 Structured Probability Queries 

Each probability-table query asks for the probability distribution of a principal set of primary variables, given some 
conditioning set of primary variables; the remaining primary variables are considered the marginal set. Any of these 
three sets may be empty, but every primary variable must be assigned to one of these positions to make a valid 
query. Therefore for a parametric probability network with m primary variables, there are 3 m possible probability-table 
queries (thus 3 6 =>• 729 possibilities for the model). The result of a probability-table query is a probability table 
with one or more elements. Each element in this result table is a polynomial function of the component probabilities 
or a quotient of such polynomials; as a special case an element can be a plain rational number. 

For example, to ask for the fractional truth value of the proposition P — > Q, which is represented in the model 
as the variable R, we ask the probability-table query Pr(R). This query has the principal set {R}; its conditioning 
set is empty; its marginal set {P,Q,A,B,C} contains the remaining primary variables. The following session shows 
how to use the pqlsh command-line interface to load the probability model and to issue this probability-table query; 
here pqlsh> is the system prompt, user input is shown in italic type, and input follows the syntax of the Tel scripting 
language [25|: 

pqlsh> set m [pql::load basicl.pql t] ; return; 



pqlsh> set tr [$m table R] ; $tr infer; $tr print -index 
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Index I R I Pr( {R} ) 



1 |T I 1 - x + x*y 

2 |F I x - x*y 

Such results can also be generated in typeset form: PQL commands included as special comments in IATgX documents 
are replaced by the pqlpp preprocessor with the appropriate query results (that is how this document was generated). 
Anyway, inspecting the results above and recalling the parameter definitions in Table[T[ you can see that the fractional 
truth value of the formula P — > Q is a simple polynomial function of the parameters x and y, where x is the input 
probability that P is true and y is the input conditional probability that Q is true given that P is true. The bound 
constraints x € [0, 1] and y g [0, 1] given in Table [TJ are considered to be part of the result (as would be additional 
parameter constraints if there were any): thus the result Pr (R) is the above probability table along with the constraints 
that the parameters x and y take real values between zero and one. 

Inspecting the first element 1 — x+xy in the result of the table query above, note that Pr (R = T) = if and only if 
x = 1 and y = (considering also the preexisting constraints ^ x ^ 1 and ^ y ^ 1). In terms of the example, the 
statement P — » Q that bird implies flight is certainly false exactly when the creature is certainly a bird (x = 1) and it 
is certain that no bird can fly (y = 0). Conversely, from either constraint x = or (x,y) = (1,1) it follows by simple 
algebra that Pr (R = T) = 1. In other words, the implication P — > Q that bird implies flight is certainly true if the 
creature is certainly not a bird (x = 0); or if the creature is certainly a bird and it is certain that a bird can fly (x = 1 
and y = 1). Other values of x and y give intermediate values of Pr (R = T) and its complement Pr (R = F). 

Moving on, in order to ask the conditional probability of Q given P we use the probability-table query Pr (Q \ P): 



pqlsh> 


set tqp 


[[$m 


table Q 1 P] infer]; $tqp print - 


Index 


1 P 


Q 


1 Pr( {Q} I {P} ) 


1 


1 T 


T 


1 (x*y) / (x) 


2 


1 T 


F 


1 (x - x*y) / (x) 


3 


1 F 


T 


1 (z - x*z) / (1 - x) 


4 


1 F 


F 


1 (1 - x - z + x*z) / (1 - x) 



In this case the principal set is {Q}, the conditioning set is {P}, and the marginal set is {R,A,B,C}. Again the 
constraints that the values of x, y, and z must lie between and 1 are considered part of the result. 

4.2 Simple Table-Based Probability Inference 

Symbolic probability-network inference follows principles that were already well-described by de Moivre in the early 
18th century (notably before the famous paper from Bayes; in fact Bayes referred explicitly to de Moivre's work) 
E2l [Tl. Modern probability-network inference methods focus on efficiency through clever factoring strategies and 
sophisticated graph manipulations 1201 |2Tl . But for our purposes, inelegant brute-force inference will suffice; this 
simple approach also helps to elucidate the polynomial form of the probabilities that are computed. Moreover this 
simple inference method offers its own routes to improved performance: it happens that the necessary arithmetical op- 
erations can be performed in parallel (taking advantage of computers with multiple processors), and that the collected 
calculations are essentially relational-database operations (for which well-optimized software has been developed). 

Simple table-based probability inference requires three steps: joining, aggregation, and division. In the first step the 
component probability tables are joined into the full-joint probability table, by multiplying the component probabilities 
in a certain fashion. Each full-joint probability is the product of one element from every component probability table; 
the overall calculation is equivalent to a relational database join operation with a small amount of post-processing. 
In the second step the full-joint probabilities are aggregated to compute the marginal probabilities of the queried 
events, by adding suitable joint probabilities together. This aggregation step corresponds to the relational database 
operation of the same name. For a conditional query, one aggregation must be performed for the numerator events 
and a second aggregation for the denominator events in the result table. The numerator events use the set union of 
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primary variables in the query's principal and conditioning sets; the denominator events use only the variables in 
the conditioning set. In the third step the aggregate probability of each numerator event is divided by the aggregate 
probability of its corresponding denominator event; this is also a modified relational join operation. It emerges that 
each element of the result table is the quotient of sums of products of the original component probabilities: in other 
words a fractional polynomial in the model parameters, if each input component probability was itself a polynomial. 
When a query's conditioning set is empty, no denominator table is constructed and no division is performed; hence 
unconditional probability-table queries yield sums of products of component probabilities (thus simple polynomial 
outputs from simple polynomial inputs). 

To illustrate symbolic probability-network inference, consider the conditional probability query Pr(Q\P,R) for 
the model ^pq. The principal set is {Q} and the conditioning set is {P,R}- In order to evaluate this query using 
simple table-based inference we must first compute the full-joint probability table Pr (P, Q,R,A, B,C); then compute 
the numerator table Pr (P,R, Q) (which uses the set union {Q} U {P,R} of variables from the principal and conditioning 
sets) and the denominator table Pr (P,R); and finally divide the corresponding numerator and denominator elements by 
one another. The number of elements in the full-joint table Pr (P, Q,R,A,B,C) is given by the product of the number 
of states for each variable: in this case 2x2x2x1x4x4 which is 128. Every element in the full-joint probability 
distribution is the product of several component probabilities. For example, the full-joint probability: 



Pr (P = F, Q = F,R = T,A = 3,5 = 
is given by the product of the corresponding component probabilities: 



1,C = 2) 



Pro CP =F) 
Pro (A = 3) 



Pr (e=F|P=F) x ProOR = T|P = F,e=F) x 

Pr (5 = 1\P= F,Q = F,R = 1) x Pr (C = 2 \A = 3,5 = 1) 



(6) 



(7) 



Substituting the values fromTable|2j this product becomes (1 — x)(l — z)(l)(l)(l)(l) which simplifies to 1 —x—z+xz. 
For the ^#pq model it happens that only 4 of the 128 full-joint probabilities are not zero: 



Index 


P Q R A B C 


Pr(PQ,R,A,B,C) 


13 


T T T 3 3 


xy 


55 


T F F 3 1 2 


x — xy 


74 


F T T 3 2 1 


Z — xz 


103 


F F T 3 1 2 


l—x — z+xz 



(8) 



Here the index numbers are relative to all 128 full-joint probabilities, arranged in a particular lexicographic order. 

The numerator table Pr (P,R, Q) and the denominator table Pr (P,R) are computed by adding appropriate elements 
of this full-joint probability table. For example the marginal probability Pr (P = F,R = T) is given by the sum of the 
probabilities of the corresponding nonzero elements of the full-joint probability table (in rows 74 and 103): 



Pr(P=F,e = T,fl = T,A = 3,B = 2,C=l) + Pi(P=F,Q 
Substituting the polynomial values from Equation[8]yields: 

Pr(P=F,R = T) => (z-xz) + (l-x-z- 
The complete tables for Pr (P,R, Q) and Pr (P,R) are shown here: 



F,R = T,A = 3,B=1,C=2) 



xz 



(9) 



(10) 



Index 


P R Q 


Pr (P,R,Q) 




1 


T T T 


xy 


2 


T T F 


Index 


P R 


Pi(P,R) 


3 


TFT 


1 


T T 


xy 


4 


T F F 


x — xy 2 


T F 


x — xy 


5 


F T T 


z — xz 3 


F T 


1 -x 


6 


F T F 


l—x — z + xz 4 


F F 





7 


F F T 







8 


F F F 






(11) 
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Dividing each element of the numerator Pr (P,R, Q) result table by the matching element of the denominator Pr (P,R) 
result table yields the queried conditional probability table Pr (Q\P,R): 



Index 


p 


R 


o 


Pr (OlP R) 


1 


T 


T 


T 


(xy) / (xy) 


2 


T 


T 


F 


(0)/(xy) 


3 


T 


F 


T 


(0)/(x-xy) 


4 


T 


F 


F 


(x — xy) 1 (x — xy) 


5 


F 


T 


T 


(z-xz)/(l-x) 


6 


F 


T 


F 


(1 -x-z+xz)/(l-x) 


7 


F 


F 


T 


(0)/(0) 


8 


F 


F 


F 


(0)/(0) 



Note that the condition Pr (P = F,R = F) is impossible; in terms of the embedded logical formulas the material impli- 
cation R := (P — > Q) cannot be false if its premise P is false. Hence both probabilities Pr (Q = T | P = F,R = F) and 
Pr (Q = F \P = F,R = F) conditioned on this impossible event involve division by zero. By default such exceptional 
elements are not displayed; they are included above as the quotient-expression 0/0 to clarify the calculations that have 
occurred. Omitting these indeterminate elements and pivoting the table to show the probabilities given each possible 
condition in the same row, the same result for the query Pr (Q \ P,R) is displayed as: 



&(Q\P,R) 



Index 


P 


R 


Q = T 


G = F 


1,2 


T 


T 


{xy) 1 (xy) 


(0)/(ry) 


3,4 


T 


F 


(0)/(x-xy) 


(x — xy) / (x — xy) 


5,6 


F 


T 


(z-xz)/(l-x) 


(\-x-z+xz)/{l-x) 



4.3 Linear Functions Follow Form 

If the primary variables Pi,Pi, ■ ■ ■ ,P V representing embedded variables are modeled as a clique with a single parameter 
specifying each probability in the joint distribution Pro (Pi,P2, ■ ■ -,Pv), and furthermore if there are no other parameters 
in the probability network, then all inferred probabilities must be either linear functions of the parameters or quotients 
of such linear functions. This special case turns out to be quite useful, as it is the natural way to model many problems 
combining logic and probability. For example, if we were to modify the ^# P q model so that the prior probabilities on P 
and Q were specified as this joint distribution Pro (P, Q) instead of as the separate probabilities Pro (P) and Pro (Q \ P): 



p 


Q 


Pro (P, 2) 


T 


T 


x\ 


T 


F 


X2 


F 


T 


x 3 


F 


F 


X4 



then all probabilities inferred from this revised model would be linear functions of its parameters or quotients of such 
linear functions. For example: 



p 


R 


Q 


Pv(P,R,Q) 














T 


T 


T 


Xl 














T 


T 


F 





P 


R 


Pi (P,R) 


Pr (Q\P,R) 




T 


F 


T 





T 


T 


Xl 


P R 


Q = T 


e = F 


T 


F 


F 


X2 


T 


F 


X2 


T T 




(o)/(*0 


F 


T 


T 


x 3 


F 


T 


X3 +X4 


T F 


(0)/(*2) 


C*2)/(X 2 ) 


F 


T 


F 


Xa, 


F 


F 





F T 


(x 3 )/(x 3 +x 4 ) 


(x 4 )/(x 3 +x 4 ) 


F 


F 


T 

















F 


F 


F 


















11 



4.4 Handling Division by Zero 



You may have noticed that quotients such as (xy) / (xy) and (z — xz) / (1 — x) were not simplified in the results displayed 
above. That is because it is important to recognize when division by zero is possible: this corresponds to the imposition 
of an impossible condition, in which case conditional probabilities are appropriately undefined. (Having established 
that there are no angels dancing on the point of a needle, there is no unique answer to what proportion are boy-angels 
versus girl-angels.) Let us consider two additional options for handling division by zero within the framework of 
parametric probability. 



DOUBLE-BACKSLASH NOTATION It is useful to have more compact notation to describe the value of a quotient 
whose denominator might be zero. For this the following convention is proposed: let us say that the value of an 
expression y \ a = j3 is usually y, except that if the condition a = j3 holds then the value of the expression is 
undefined. The double backslash \ may be read 'unless' or 'except that it is undefined if; you may think of it as a 
distant relative of the set difference operator \. This new construction is similar to the ternary conditional operator ? : 
in the C programming language: in C the expression a == b ? c : d has the value c if the condition a == b holds 
and d otherwise 1 16 1. 

Using this double-backslash notation the quotient xy/x would be rendered y \ x = 0, the quotient x/x would be 
rendered 1 \ x = 0, and the quotient 0/x would be rendered \ x = 0. It is best to avoid double-backslash notation 
when the undefining condition is tautological: thus 0/0 should be displayed as such (or some other designation for a 
value that is always undefined) instead of as the less intuitive 1 \ = 0. By this double-backslash convention we can 



display the result for the query Pr (Q | P, S) in the following way (compare with the original table in Equation 12 1 



Index 


P R Q 


Pr(Q\P,R) 


1 


T T T 


1 \xy = 


2 


T T F 


\\ xy = 


3 


TFT 


\ xy = x 


4 


T F F 


1 \ xy = x 


5 


F T T 


z\x=l 


6 


F T F 


(1 -x-z + xz) / (1-x) 


7 


F F T 


0/0 


8 


F F F 


0/0 



(16) 



Like most of the tables and figures in this document, this result table was generated automatically by the author's 
computer program in response to a structured query. The current version of the program is not very good at factoring 
polynomials: here it has not figured out that the quotient (1 — x — z + xz) / (1 — jc) simplifies to 1 — z \ x = 1. It 
would be good for a future implementation of parametric probability analysis to be integrated with a general-purpose 
computer algebra system; for temporal reasons this was not done in the current version of the computer program. 



ALTERNATIVE PARAMETRIC INDETERMINACY It may be desirable to handle division by zero in a different way, 
by introducing additional parameters to encode indeterminate values while preserving the semantics that probabilities 
are proportions that add up to one. For this example we might report the value of Pr (Q = T \ P = F,R = F) as 9 and 
the value of Pr (Q = F | P = F,R = F) as 1 — 9, where 9 is a new parameter subject to the constraint ^ 9 ^ 1 about 
which no other constraints are allowed. In this way we would maintain the property that these two mutually exclusive 
and collectively exhaustive conditional probabilities have the sum one and that the value of each probability must lie 
between zero and one, but we would leave the precise value of each probability indeterminate. For example by this 
parametric-indeterminacy convention we would consider the sum (xy) / (xy) + (0) / (xy) to have the definite value 1 
even if it feasible that xy — (in particular when using elements of the result table for the query Pr (Q \ P,R), since in 
this context these fractional polynomial values would describe the probabilities of mutually exclusive and collectively 
exhaustive events). 



12 



Logical 


Polynomial in I 


l[P,Q] Polynomial in F 2 [P, Q] 


Description 


T 


1 


1 


Elementary truth 


F 








Elementary falsity 




1 P 


1+P 


Negation (NOT) 


PAQ 


PQ 


PQ 


Conjunction (AND) 


P®Q 


P + Q-2PQ 


P+Q 


Exclusive disjunction (XOR) 


PVQ 


P+Q-PQ 


P+Q+PQ 


Inclusive disjunction (OR) 


P^Q 


1 -P+PQ 


l+P+PQ 


Material implication 




1-P-Q + 2PQ 


l+P + Q 


Biconditional (XNOR) 


pte 


l-PQ 


l+PQ 


Nonconjunction (NAND) 


PiQ 


l-P-Q+PQ 


l+P+Q+PQ 


Nondisjunction (NOR) 



Table 3 Boolean representation of logical formulas, illustrated for propositional variables P and Q and for polynomials 
with real or finite-field coefficients. Using real-number coefficients Boole's 'special law' constraints P 2 = P and Q 2 = Q are 
necessary to limit the possible values of each variable to {0, 1}. 



4.5 Boolean Polynomials and Coincident Probabilities 

Let us briefly review Boole's polynomial notation for logical formulas. The mappings between Boole's polynomials 
and what is now standard logical notation (our mash-up from Hilbert, Peano, and others) are shown in Table [3] These 
are sometimes called the 'Stone isomorphisms' after ||3D . Despite the convenient phrase 'Boolean translation' it 
should be noted that Boole did not use polynomials to translate from some other conventional system of symbolic 
notation for logic — for in his time there was no such convention. (Frege's Begriffsschrift was published in 1879, many 
years after Boole's death in 1864; likewise Boole's lifetime predated the works of Peano and Hilbert in which much 
of modern logical notation was developed 0.) Nonetheless, when viewed as translation, what Boole described turns 
out to be a special case of Lagrange polynomial interpolation (6). 

Contrary to a very common misrepresentation, Boole's polynomials used ordinary integer coefficients for which 
1 + 1=2. However there are some advantages to using instead coefficients in the finite field F 2 of order 2, which uses 
integer arithmetic modulo 2 (hence 1 + 1=0, addition and subtraction are the same operation, and each value is its 
own additive inverse). With coefficients in F2 the polynomials that represent logical formulas are simpler in form and 
the number of distinct polynomials is finite (given a finite set of propositional variables). Table [3] includes mappings 
from conventional logical formulas to polynomials with coefficients in the binary finite field as well as to polynomials 
with coefficients in the real numbers. 

Boole's polynomial notation for logical formulas is often understood in a monolithic way but it was actually the 
expression of two different ideas: first, the idea that classical logical operations (conjunction, disjunction, negation, 
and so on) are equivalent to certain combinations of ordinary arithmetical operations, when elementary truth values are 
represented as ordinary numbers; and second, the idea that the probability that a logical formula is true is a polynomial 
function of the probabilities that its constituent propositional variables are true. In a special case the polynomial that 
denotes a logical formula coincides with the polynomial that expresses the probability that the formula itself is true. 
It is important to recognize the independence property required for this coincidence, and to generalize a means to 
compute appropriate probabilities in the case that this independence property does not hold (Boole did both). 



COINCIDENCE FROM INDEPENDENT PROPOSITIONAL VARIABLES When the propositional variables in use are 



modeled as probabilistically independent of one another (as described in Section 3.1 1, then the probability that any 



compound formula is true coincides with its Boolean polynomial representation (using real coefficients). For example, 
as shown in Table[3] the Boolean representation of the logical formula X «-» Y is the polynomial 1 — X — Y + 2XY. This 
Boolean coincidence principle provides that if X and Y are independent in the probability model, with Pro (X = T) := x 
and Pro (Y = T) := y, then the probability Pr ((X Y) = T) that the compound formula X -H> Y is true has the value 
1 — x — y + 2xy that mirrors the Boolean polynomial representation 1 — X — Y + 2XY of this compound formula. This 
coincidence is not generally present when the propositional variables in use are probabilistically correlated. For exam- 
ple if Pr (X = T) := x, Pr (Y = T \X = T) := y, and Pr (Y = T \X = F) := z, then the probability Pr ((X oF) = T) 



13 



has the value 1— x — z+xy+xz. This value is different from 1 — x — y + 2xy exactly when y ^ z, in other words when 
the probabilities of X and Y are nontrivially correlated. The general symbolic probability-network inference method 
discussed in this section computes correct answers with or without the assertion that the propositional variables are 
independent. 

5 SECONDARY ANALYSIS: ALGEBRA, OPTIMIZATION, AND SEARCH 

The results generated by symbolic probability-network inference are always algebraic functions of the model param- 
eters; more specifically they must be polynomials or quotients of polynomials, when the domain of each primary 
variable is finite and when each input component probability is itself a polynomial. These computed polynomials are 
ordinary mathematical objects that can be manipulated by ordinary mathematical methods. Boole focused on opti- 
mization as the secondary analysis that followed his symbolic probability inference. In addition to this very useful 
technique, we can broaden the scope of secondary analysis to include more general applications of symbolic and nu- 
merical analysis to the polynomials generated by symbolic probability-network inference. Here we consider algebra, 
optimization, and search. 

5.1 Algebra with Polynomials 

Perhaps the simplest kind of secondary analysis for the fractional polynomials computed by symbolic probability- 
network inference is elementary algebra: these formulas can be added, subtracted, multiplied, and divided to form 
new fractional polynomials. For example, consider the difference between the probability that the logical formula 
P — > Q holds and the conditional probability that Q is true given that P is true. Since the model ..#pq uses the 
definition Z? := (P — » Q), the requisite difference is Pr (R = T) — Pr (Q = T | P = T). To compute this we first select 
the appropriate elements from the result tables presented in Section |4~T| 

pqlsh> $tr item 1 
1 - x + x*y 
pqlsh> $tqp item 1 
(x*y) / (x) 

We then ask pqlsh to compute the difference (as a human or computer algebra system could easily do): 
pqlsh> pql::expr " [$tr item 1] - ([$tqp item 1] ) " 

(x - x~2 - x*y + x~2*y) / (x) 



Rewriting this difference with TgX formatting and the double-backslash notation from Section 4.4 we have: 



Pr(fl = T)-Pr(e = T|P = T) 1 -x-y+xy \ x = (17) 



It follows from Equation 17 that the two quantities Pr (R = T) and Pr (Q = T | P = T) are different unless x = 1 or 



y = 1 or both (with the caveat that the difference is undefined when x — 0). Perhaps this is clearer when the polynomial 



difference in Equation 17 is factored as (1 — —y){x/x), which plainly has roots x — 1 and y = 1. In terms of the 
example, the probability that the statement 'bird implies flight' is true differs from the conditional probability of flight 
given bird, unless the creature is certainly a bird (x = 1) or it is certain that all birds can fly (y = 1) or both. However 
the difference is undefined if the creature could not possibly be a bird (x — 0), because in that case the requested 
condition P = T that the creature is a bird is impossible. 
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As another example of secondary analysis by algebra, let us calculate the expectation E(B): the mean value of 
the number B of True propositions among {P, Q,R}. For this we begin with the result for the probability-table query 
Pr (B), computed as described in Section 4.2 



Index 


B 


Pr(fl) 


1 








2 


1 


1 —z — xy+xz 


3 


2 


Z — xz 


4 


3 


xy 



The expected value E(B) is then computed in the usual way as the sum of each possible value fo, G {0, 1,2,3} of B 
weighted by the probability that it is attained: [bj x Pr [B — hi)]. Thus we calculate: 

Ox (0) + l x (1 -z-xy + xz)+2 x (z~xz)+3 x (xy) =S> l+z + 2xy-xz (19) 

which establishes using simple algebra that the expected value E(B) is the polynomial 1 +z + 2xy — xz. 



5.2 Polynomial Optimization 

Next we consider secondary analysis using optimization (mathematical programming). The polynomials computed 
by symbolic probability-network inference, and the additional polynomials derived by algebraic calculation, can be 
used as constraints and objective functions in optimization problems. In Laws of Thought, Boole routinely sought to 
calculate the minimum and maximum feasible values of some polynomial objective computed by symbolic probability- 
network inference, subject to constraints reflecting the laws of probability. However, Boole's 19th-century optimiza- 
tion methods were not very robust. Polynomial optimization problems are still challenging to solve; they are generally 
nonlinear and nonconvex, and therefore they can have local solutions that are not global solutions. But there are 
much better global optimization algorithms now, including reformulation-linearization and semidefinite programming 
techniques l29l \l9l . Specifically for the application of parametric probability analysis, the author has developed a 
new reformulation and linearization algorithm that computes interval bounds on the global solutions to multivariate 
polynomial optimization problems, using mixed integer-linear program approximations; the user controls the tightness 
of the bounds and the time required to compute them by setting the number of reformulation variables [24|. 

To illustrate optimization as secondary analysis, let us consider the minimum and maximum feasible values of the 



expectation E(B) given in Equation 19 with the added constraint that there a 75% chance or less that the statement 
that 'bird implies flight' is true: Pr (R = T) ^ 0.75. Recalling the specification of the ^#pq probability model given 
in Tables [T| and [2j we include the constraints ^ x ^ 1, ^ y ^ 1, and ^ z ^ 1. For the additional constraint 



regarding Pr (R — T) we recall from Section 4.1 that Pi(R = T) evaluates to 1 — x+xy. The resulting inequality 
1 — x + xy ^ 0.75 simplifies to 0.25 +xy ^ x. Hence to find minimum and maximum bounds on E(B) we must solve 
the paired polynomial optimization problems: 

minimize : 1 + z + 2xy — xz maximize : 1 + z + 2xy — xz 

subject to : 0.25+xy^x subject to: 0.25 +xy ^x 

and : x sC 1 and : x ^ 1 (20) 

0s^y< 1 0<ysC 1 

Os^z^l 0<z^l 



Using a small number of reformulation variables, the author's bounded global polynomial optimization solver com- 
putes that the global minimum lies in the interval [0.938,1.000] and that the global maximum lies in the interval 
[2.500,2.594]. With more reformulation variables the solver generates tighter intervals [1.000, 1.000] and [2.500,2.500]. 
The solver reports that the global minimum 1.000 is achieved at the point (x = 0.984, y = 0.000, z = 0.000) and that 
the global maximum 2.500 is achieved at the point (x = 1.000,^ = 0.750, z — 0.266). You may notice by inspection 
of Equation 19 that, absent any constraints besides the bounds on the parameters x, y, and z, the global minimum of 
the expected value E(B) is exactly 1 (when x — y — z — Q) and its global maximum is exactly 3 (when x = y = z = 1). 
The constraint Pr (R = T) ^ 0.75 added for this example has rendered some of that range infeasible. 
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Table 4 Instantiations of Pr (B* = 0) and Pr (B* = 2) calculated by substituting the listed values of (t\ ,?2> f 3i f 4) i n1:o tne 
corresponding polynomials from Equation [2l] In row 10 both instantiated polynomials are identically zero. 



5.3 General Search 



Providing yet another mode of secondary analysis, polynomials generated by symbolic probability-network inference 
can be used in general computer-science search problems that might be awkward to formulate in terms of algebraic 
equations or numeric optimization. To illustrate, note that in the model „-#pq it is impossible for all three propositions 



P, Q, and R to be false; as Equation 11 shows, the joint probability Pr [P = F,R = F,Q = F) evaluates to zero. This 
makes semantic sense based on the definition R:= (P — > Q), for if the premise P is false then the material implication 
P — > Q must be true; hence both propositions cannot be false simultaneously. Let us search for a logical proposition 
that has a different property: a logical function R* of P and Q such that the number of true propositions among 
{P,Q,R*} must be odd (either 1 or 3). 

To set up this search let us replace the original component probability table Pr (R\P,Q) shown in Table |2]part (c) 
with the table for Pr (R* \ P,Q) given in Equation [5] in which the probabilities of each value of R* given each combi- 
nation of values for P and Q are encoded by the parameters t\, t-i, tj, and tn as described in Section [33] Using this 
replacement parametric table and symbolic probability-network inference as in Section |4~2"1 we compute the probability 
distribution on the number B* of true propositions among {P, Q,R*}: 



B* 


Pr(B*) 





1 — X — Z — t4+XZ+Xt4+Zt4~ XZtA 


1 


X + Z + t4 — xy — XZ — Xt2 — Xt4 — &3 — Zt4+Xyt2 +XZt3 +XZt4 


2 


xy + xt2 + Zt 3 — xyt i — xyt2 — xzh 


3 


xyh 



(21) 



Now we desire to find the values of (t\ , ?2 , ?3 , £4) for which the only possible values of B* are 1 and 3. In other words, we 
require that the polynomials Pr (B* — 0) and Pr (B* = 2) are both identically zero after substituting the selected values 
of (t\,t2,t3,t4). Considering each value tj £ {0, 1} there are 2 4 =>• 16 possible values of the vector (t\ ,t2,h,t4). For this 
small problem we simply enumerate every possibility and substitute these values into the polynomials in the result table 
for Pr (B* ) given in Equation 21 (Of course the point of most search algorithms is to avoid exhaustive enumeration of 
the search space; we eschew such luxury for now.) In Table|4]the table on the left gives the instantiations of Pr (B* = 0) 
and the table on the right gives the instantiations ofPr(B* =2) at all 16 possible values of (?i,?2,f3,?4). 

Comparing these tables you can see that only in row 10 are both polynomials identically zero. In every other 
case it is either possible that Pr (B* = 0) is not zero, that Pr (B* = 2) is not zero, or that both probabilities are not 
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zeroJ^Row 10 corresponds to the vector (fi,f2,f3,£4.) = (1,0,0, 1) and in turn to the logical formula R* lQOl := (P <-> Q) 
that combines P and Q using the biconditional operator (see Section 3.3 1. Here is the probability distribution on the 



number B* 1001 of true propositions among the set {P, Q, R\ om }, which is obtained from substituting the solution values 



(fi,?2,f3,f4) = (1,0,0, 1) into Equation 21 



" 1001 


PrWooi) 








1 


1 — xy 


2 





3 


xy 



(22) 



Thus by applying general search as secondary analysis, we have computed that the number of true propositions among 
the set {P, Q,P ■<-» Q] must be odd; it cannot happen that exactly or 2 of these propositions are true, regardless of the 
prior probabilities x, y, and z. Moreover, search has demonstrated that the only other formulas R* with this odd-number 
property must have the same truth table as P f-> Q. For a formula with any other truth table there would be feasible 
values of the parameters (x,y,z) for which Pr (B* = 0) > 0, Pr (B* = 2) > 0, or both. 



6 ADDITIONAL MODELING ISSUES 



Having discussed primary and secondary analysis of parametric probability networks, there are two additional mod- 
eling issues to consider. First, there are two different techniques to model conditions during parametric probability 
analysis: as denominator events in conditional probability queries and as constraints in optimization problems. Second, 
it is possible to encode some statements about implication directly as conditional probabilities, without the intermedi- 
ate device of embedded formulas from the propositional calculus. Using such direct probability encoding, constraints 
on conditional probabilities can express quantification without the need for classical logical quantifiers and with the 
option to specify more precise fractional values than just 'some'. 



6.1 Subjunctive Conditions, Imperative Constraints 

Problems in logic and probability commonly involve conditions, and there are two idioms for representing conditions 
during parametric probability analysis. Using the subjunctive idiom, a condition is modeled as the denominator event 
in a conditional probability-table query. Using the imperative idiom, a condition is modeled as an equality constraint 
in an optimization problem. These alternative formulations have slightly different semantics. In the subjunctive 
formulation we ask hypothetically what would be the probability of some event, if the stated condition were to hold. In 
the imperative formulation we assert factually that the stated condition must hold, and then ask what is the probability 
of some event under this necessary condition. These two idioms yield essentially the same solutions, although they 
report the exception that the stated condition is impossible in two different ways: in the subjunctive formulation an 
impossible condition causes division by zero, but in the imperative formulation an impossible condition produces an 
unsatisfiable system of equations. Both conditioning idioms can be used for parametric probability networks with or 
without embedded logical formulas. 



EXAMPLE: TWO MODES OF MODUS PONENS There are two different ways to express the familiar phenomenon of 
modus ponens with a parametric probability network, and these correspond to two slightly different questions. First 
we might ask in a subjunctive mood: if P and P — » Q were to be true, what would be the probability that Q is also true? 
Second we might specify in an indicative mood that P and P 1 — > Q must certainly be true, and then ask the probability 
that Q is also true. Recall that in model ^#pq the variable R stands for the proposition P — > Q. 

For the subjunctive formulation we use a probability-table query to compute Pr (Q\P,R), the result of which is 



shown in Equation 12 The first element of this result table gives the desired conditional probability: 

Pr(e = T|P = T,fl = T) => (xy)/(xy) (23) 

For this problem it happens that every polynomial in Table [i] that is not identically zero has a value that is strictly greater than zero at some 
feasible value of the parameters (x,y,z). In general it would be necessary to solve an optimization problem to verify that a polynomial which is 
symbolically different from zero indeed has a feasible value greater than zero, taking into account all of the constraints provided. 
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The value of this quotient is one unless its denominator is zero (that is, 1 \ xy = using the notation of Section 4.4 1. 
We interpret this to mean that that if both propositions P and P — > Q were to be true, then Q would also be true 
with probability 1 (unless x = or y = or both, in which exceptional cases the condition {P = T,R = T} would be 
impossible and thus the requested conditional probability would be the indeterminate expression 0/0). In other words, 
if the creature happens to be a bird, and if it happens to be true that bird implies flight, then it would certainly be true 
that the creature can fly. Except that if it were already known a priori that it is impossible for the creature to be a bird, 
and/or that there is no chance that a bird can fly, then the question of flight assuming the stated conditions would not 
have a definite answer because the conditions would be impossible. 

Alternatively, to use the indicative formulation, we build an optimization query in which we ask the minimum 
feasible value of the objective Pr (Q = T) subject to the following constraints that P and R must certainly be true: 

Pr(P = T) = 1 (24) 
Pr(fl = T) = 1 (25) 

We specify the constraint on P using the first element of its result table (the output Pr (P) happens to be the same as 
the input Pro (P) shown in Table|2| and the constraint on R from the first element of Pr (R) shown in Section 4.1 



pqlsh> set cp "[[[$ml table P] infer] item 1] == 1" 
x == 1 

pqlsh> set cr " [$tr item 1] == 1" 
1 - x + x*y == 1 

The second constraint simplifies to x = xy. Moving on, we obtain the objective function as the first element of the 
result table for the query Pr (Q): 

pqlsh> set tq [[$ml table Q] infer] ; $tq print; 
Q I Pr( {Q} ) 



T I z + x*y - x*z 

F I 1 - z - x*y + x*z 

Our optimization query is a request for a polynomial program using this objective function and these constraints: 

pqlsh> set pq [$ml pprog -min " [$tq item 1] " $cp $cr] ; return; 
This optimization query generates the following polynomial optimization problem: 

minimize : z+xy — xz 
subject to : x = 1 

x=xy 

and : O^x^l y ' 

0s^< 1 
Os^zsC 1 

Solving this problem gives bounds on the minimum feasible value of Pr (Q = T) under the constraints Pr (P = T) = 1 
andPr(/? = T) = 1: 

pqlsh> $pq solve; $pq solution 
1.000 1.000 
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pqlsh> $pq point 



{x = 1.000} {y = 1.000} {z = 0.000} 

This solution to the optimization problem in Equation [26] demonstrates that proposition Q must be true (that is, the 
variable Q attains the state T with minimum probability in the interval [1 .000, 1 .000]) if P and P — > Q are constrained 
to be true. (You can see by inspection that the global solution to Equation [26] is exactly 1.) In other words, if it is 
certainly true that the creature is a bird, and it is certainly true that bird implies flight, then it is certainly true that the 
creature can fly. The exceptional cases x = and y = are handled differently in this formulation: instead of causing 
division by zero, they identify infeasible points relative to the constraints in Equation [26] 



6.2 Direct Probability Encoding 

It is possible to model implication and quantification directly in parametric probability networks, without using the 
classical logical devices of material implication or universal and existential quantifiers. We have already seen the state- 
ment of material implication P — > Q used to model the idea that Q is a necessary consequence of P (with subjunctive 
and indicative idioms to impose the condition that this statement of material implication is true). We could instead 
constrain the conditional probability Pr (Q = T \ P = T) to 1 to express the idea that that Q is a necessary consequence 
of the premise P. Similarly the constraint Pr (Q = T \ P = T) = is an alternative way to model the assertion that Q is 
never a consequence of P. 

These equality constraints on conditional probabilities are alternatives to universally-quantified statements such as 
Va(P(a) -» Q(a)) to say that all P are Q; or Va(P(a) — > ^Q(a)) to say that no P are Q. In a similar fashion, to 
model the assertion that Q sometimes follows P we could constrain the relevant conditional probability to be strictly 
greater than zero: Pr (Q = T \P = T) > 0. And to model the assertion that Q sometimes does not follow P we could 
constrain the relevant conditional probability to be strictly less than one: Pr (Q = T \ P = T) < 1 . These inequality 
constraints on conditional probabilities are alternatives to existentially-quantified formulas such as 3a(P(a) — > 2(°0) 
or 3a(P(a) A Q{ol)) to say that some P are Q; or 3a(P(a) A -^Q(a)) to say that some P are not Q. In the framework 
of parametric probability, quantified variables like a are distinct both from primary variables and from parameters. 



Recall from the secondary analysis in Section 5.1 that the conditional probability that Q is true given that P is true is 
mathematically distinct from the probability that the material-implication statement P — > Q is true. These correspond 
to symbolically different polynomials; the arithmetical difference between them depends on the prior probabilities 



assigned to the events P and Q as reported in Equation 17 



THE OPTION OF EXISTENTIAL IMPORT The constraints introduced to quantify propositions can be specified such 
that they do or do not have existential import, as the user desires. In general, constraining input component probabil- 
ities specified by the user can have different effects from constraining output probabilities computed by the system; 
existential import is one of those mutable effects. Elementary algebra helps to clarify the consequences of various 
polynomial constraints. 

To illustrate, here are the input component probability table Pro((2|P) from Table [2]part (b); the computed table 
Pr (Q | P) for this same conditional probability, created as in Section 4.2 , and the computed probability table Pr (P): 



Pro (g| P) 




Pi(Q\P) 






p 


Q = T 




P 


Q = T 


Q 


= F 


T 


y 


i-y 


T 


(xy) / (x) 


(x 


-xy)/(x) 


F 


z 


l-z 


F 


(z-xz)/(l-x) 


(1 


— x — z+xz) 1 (1 — X) 



p 


Pr(P) 


T 


X 


F 


l-x 



(27) 



Now, adding the constraint y > that the input value Pro (Q = T\P = T) must be strictly greater than zero would 
specify that Q is sometimes a consequence of P (when P happens to be true) without asserting that P is ever true. 
For if x — then the computed probability Pr (P = T) would be zero even if y > 0. Thus the input-value constraint 
Pro (Q = T | P = T) > 0, meaning y > 0, does not affect the values in Pr (P); it has no existential import. Incidentally, 
in the case x = the inferred conditional probability Pr (Q = T \P = T), calculated by the laws of probability as the 
quotient Pr (P = T, Q = T) / Pr (P = T), would be indeterminate due to division by zero — regardless of the value y 
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assigned to the corresponding input conditional probability Pro (Q = T \P = T). For conditional probabilities, it is 
essential to distinguish between input and output values; what is computed by the system may differ from what was 
input by the user (particularly regarding denominators that may be zero). 

On the other hand, adding the constraint (xy) / (x) > that the output value Pr (Q = T | P = T) is strictly greater 
than zero would indeed carry existential import. Since both parameters x and y are constrained to be nonnegative, by 
simple algebra this constraint would require the product xy in the numerator to be strictly greater than zero, which 
would in turn require both x > and y > 0. Consequently the output value Pr (P = T), which has the polynomial 
value x, would also be required to be strictly greater than zero. So the output-value constraint Pr (Q — T | P = T) > 0, 
meaning (xy) / (x) > 0, would also assert Pr (P = T) > 0; it has existential import. 

It is always possible to constrain output probabilities; but which input probabilities are available to constrain 
depends on the graph structure of the probability network. For example the presence of an input table Pro (Q \ P) 
requires that P is a parent of Q in the network graph. If instead P and Q were joined a clique then their joint probability 
distribution Pro (P, Q) would be specified as the input component probability table; in this case there would be no 
separate table Pro (Q \ P) of input values to be constrained. 

As an implementation detail many optimization solvers to not distinguish between strict and weak inequality 
constraints. To work around this limitation we can use some small positive constant e and weak inequality constraints 
p ^ e and p ^ 1 — e to approximate the strict inequalities p > and p < 1 for a polynomial of interest p. For example 
using e = 0.1 the constraint y ^ 0.1 approximates the strict inequality Pro (Q — T\P — T) > 0. This approximation 
scheme allows common optimization solvers to compute results that reliably distinguish between probabilities that are 
exactly zero, those that are exactly one, and those that have some intermediate value. 

FRACTIONAL QUANTIFICATION Using probability directly to model quantification offers an important benefit: we 
are not limited to the classical quantifiers 'all' and 'some'. Instead it is possible to describe and to constrain the 
precise proportion of cases for which some logical formula is true or false. In other words, besides the constraints 
p = 0, p = 1, p > 0, and p < 1 that the probability p encoding some statement of quantification is equal to zero, equal 
to one, strictly greater than zero, or strictly less than one, it is possible to specify arbitrary polynomial constraints on 
p. Thus in addition to statements like 'all P are Q or 'some P are not Q' we can model such assertions as 'exactly c 
percent of P are Q or 'between a and b percent of P are Q' or 'if there are any P, then twice as many P are Q as P are 
R' . In certain cases, the requisite constraints are guaranteed to be linear in the model parameters. 

As a philosophical aside, probability is best understood as the proportion of some underlying basic measure. There 
is diversity in what that basic measure can represent: number or cardinality (in which case the proportion is frequency); 
the absolute weight of subjective belief or of causal propensity (in which case the proportion is subjective probability); 
monetary value; mass; or some other property. It is essential for the property chosen as a basic measure to be additive 
across set unions of measured events, which is the quintessential mathematical property of a measure. In the course 
of contemplating proportional statements of quantification more precise than 'all' and 'some', and for reasoning with 
Nilsson-style fractional truth values, it may be worthwhile to clarify what the basic measure is intended to mean. 

7 ANALYSIS OF SELECTED PROBLEMS 

Now let us apply parametric probability analysis to an assortment of problems from the literature, each of which has 
been advertised as being difficult or impossible to solve by formal mathematical methods. We shall see that several 
well-known problems — about card games with logical rules, axioms with uncertainty, counterfactual conditions, and 
truthful knights and lying knaves — are nothing more than parametric probability problems. Such problems are easily 
solved by parametric probability analysis; many of them turn out to be linear optimization problems whose constraints 
and objectives are the solutions to probability queries. 

7.1 Johnson-Laird's Winning Hand 

Let us begin with a problem from Johnson-Laird that was also discussed by Bringsjord lfT31 l4l: 
If one of the following assertions is true then so is the other: 
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1. There is a king in the hand if and only if there is an ace in the hand. 

2. There is a king in the hand. 

Which is more likely to be in the hand, if either: the king or the ace? Prove that you are correct. 

It is worth repeating the challenge that Bringsjord issued with this problem: 

I don't even think Bayesian systems can possibly solve logic problems that involve probability. ... I 
would very much like to see a Bayesian system take this declarative information as input, and yield the 
correct answer, and a proof that this is the answer. I assure you that I will not hold my breath. 

Though the computational method presented here is more appropriately called 'Boolean' than 'Bayesian', there is no 
difficulty in solving logic problems that involve probability using parametric probability analysis. Proof, such as it is, 
is supplied by elementary algebra. Let us consider three ways to analyze this ace-king problem: using Boolean poly- 
nomials to simplify the logical formula involved; using parametric probability analysis with subjunctive conditioning; 
and using parametric probability analysis with indicative conditioning. 



POLYNOMIAL SIMPLIFICATION OF LOGICAL FORMULAS Perhaps the easiest way to solve this ace-king problem 
is to simplify the logical assertion in it. Using A to represent the proposition that there is an ace in the hand and 
K to represent the proposition that there is a king in the hand, the problem specifies the assertion {K A) «-> K. 
This compound logical formula simplifies to the atomic formula A after Boolean polynomial representation using 
the rules in Table [3] For example using coefficients in the binary finite field F2 the inner biconditional translates to 
the polynomial 1 +K+A, Substituting this value, the entire formula (1 + K + A) «-» K translates to the polynomial 
1 + (1+K+A)+K. Recall that using integer arithmetic modulo 2 either elementary value or 1 is its own additive 
inverse; thus all terms in this polynomial 1 + 1 +K +A+K cancel out except A. Using real-number coefficients would 
generate the same answer (keeping in mind that A can be substituted for A 2 and K for K 2 due to Boole's 'special law' 
constraints A 2 = A and K 2 =K). Taking advantage of such polynomial representation and simplification, an equivalent 
problem statement would be: 

There is an ace in the hand. 

Which is more likely to be in the hand, if either: the king or the ace? 
It is evident that the ace must be equally likely or more likely than the king, since the ace is present with certainty. 



PARAMETRIC PROBABILITY MODEL Next let us construct an explicit parametric probability network for this ace- 
king problem, using the technique described in Section [3] We copy the propositional variables A and K into the 
probability network as primary variables with the set {T, F} of possible values representing elementary truth and 
falsity. We add a third primary variable P which is defined as the value of the compound logical formula (K ■<-» A) <->K 
asserted in the problem description. We introduce parameters x\ through X4 to specify prior probabilities. Here is the 
network graph: 




(28) 



As in Section 3.1 we specify an uninformative prior probability distribution Pi-q(A 1 K) on the variables A and K 
using the x\ parameters, with the constraints ^ x, ^ 1 and x\ +X2 +X3+X4 = 1 to enforce the laws of probability. 
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Following the method of Section 3.2 we construct a conditional probability table Pro (P\A,K) to express the definition 
P := ((K f)A)f> K). These two component probability tables complete the ace-king model: 
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(29) 



Using this parametric probability network we can compare the probabilities that A and K are true, given the condition 
that P is true — using both subjunctive and imperative formulations to express the condition. 



ANALYSIS IN THE SUBJUNCTIVE MOOD Using the subjunctive idiom for conditioning, we desire the difference 
between the conditional probabilities Pr (A = T | P = T) and Pr (K = T | P = T). The relevant fractional polynomial 



values are included in the result tables for the queries Pr (A \ P) and Pr (K\P), computed as described in Section 4.2 
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(30) 



The desired difference uses the first element of each table, combined by elementary algebra: 

(xi +x 2 )/(xi +x 2 )- (xi)/(xi +x 2 ) =*> (x 2 )/(xi+x 2 ) 



(31) 



To answer the question posed in the problem we must determine the minimum and maximum feasible values of this 
difference, subject to the constraints on the parameters involved. These extreme values are the solutions to the follow- 
ing pair of optimization problems, which share a set of linear constraints and a fractional linear objective function: 



minimize : 
subject to : 
and : 



(x 2 )/ (xi +x 2 ) 

X\ +X 2 +X 3 +X 4 : 
S^Xl S$ 1 
0^12^1 



maximize : 
subject to : 
and : 



(x 2 )/(xi +x 2 ) 

X\ +X 2 +X 3 +X 4 : 

0<xi sC 1 
< x 2 1 
< x 3 1 
< x 4 sC 1 



(32) 



One way to solve these fractional linear programs is through the Charnes-Cooper transformation, which converts them 
into ordinary linear programs [7|. After such reformulation, standard linear optimization finds the minimum value 
which is achieved at the point (xi = l,X2 = 0,x 3 = 0,x 4 = 0) and the maximum value 1 which is achieved at the point 
(xi = 0,x 2 = 1 ,x 3 = 0,x 4 = 0). Thus the difference Pr (A = T | P = T) - Pr (K = T | P = T) is bounded by zero and 
one. This implies the inequality: 

Pr(A = T|f = T) ^ Pr(K = T\P = T) (33) 

In other words, given the condition P .= ((K O A) K) stated in the problem description, it is at least as likely that 
there is an ace in the hand as a king. 



PARAMETRIC PROBABILITY IN THE INDICATIVE MOOD We can find the equivalent solution using the same para- 
metric probability network model, but with an imperative rather than subjunctive query formulation. In this idiom 
our objective function is the difference Pr (A = T) — Pr (K — T) between the unconditional probabilities that there is 
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an ace versus a king in the hand. The relevant probabilities, computed as in Section 4.2 appear in the results for the 
probability-table queries Pr (A) and Pr (K): 
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(34) 



The requisite difference (x\ + X2) — {x\ +X3) simplifies to X2 — X3. Continuing on, in this indicative formulation the 
condition in the problem statement is now modeled as the additional constraint Pr (P = T) = 1, using this result for 
the query Pr (P) : 

' P I Pr(P) ~ 

(35) 
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X3 +X4 



Therefore we formulate the following pair of optimization problems to find the minimum and maximum feasible 
values of the difference between the probability of the ace and the king, subject to the constraints that the assertion in 
the problem description is true and that the laws of probability are followed: 

minimize : X2 — xj maximize : X2 — X3 

subject to : x\ +X2 +X3 +X4 = 1 subject to: x\ +X2 +X3 +X4 = 1 

X\ +X2 = 1 X\ +X2 — 1 

and : sC x x < 1 and : sC x x ^ 1 (36) 

x 2 sC 1 sC x 2 ^ 1 

^ x 3 s$ 1 ^ x 3 1 

0^I 4 <1 SC X4 SC 1 

These are simple linear programs whose objective functions are not fractional. Solving them with standard methods 
yields the minimum value which is achieved at the point (x\ = l,X2 = 0,X3 = 0,X4 — 0) and the maximum value 1 
which is achieved at the point (x\ — 0,^2 = 1 ,x^ = 0,X4 =0). In other words, subject to the constraint Pr (P = T) = 1 
that the formula (K «-» A) «-» K asserted in the problem description is true, the difference Pr (A = T) — Pr (K = T) 
between the probability of the ace and the king is bounded by zero and one. It follows that Pr (A = T) ^ Pr (K = T) 
and therefore that the ace is equally likely or more likely than the king, subject to the constraint Pr (P = T) = 1 that 
the assertion in the problem statement is true. 



7.2 The Monster from Paris 

Next we examine a problem from Paris, Muino, and Rosefield concerning inconsistent propositions 11261 . We shall see 
that the parameterized consequence relation Tl >^ and the related concepts of maximal consistency and primary and 
secondary probability thresholds presented by these authors are simply indirect ways of describing linear optimization 
problems and their solutions. For this example there is a hypothetical creature about which there are three propositions: 
P that it is a chicken killer; Q that it is Japanese; and R that it is a salamander. There are also three compound formulas 
used as axiom-like assertions (here designated Si, S2 and S3): 

Si:=(PAfi) S 2 :=HQAR)AP) S3 := (R A (->P — > (RAQ))) (37) 

Additionally, there are several more compound formulas which are used as queries (here designated S4 through Sg): 

S 4 :=(PAR) S 5 := (PA(QVR)) S 6 :=R S 7 := ^R S & :=(RA^R) (38) 
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The probability-network graph, which also includes the parameters x\ through x%, is shown here: 




(39) 



For this parametric probability network we specify an uninformative probability distribution Pro(P, Q,R) on the pri- 



mary variables P, Q, and R copied from the propositional variables, using the Xi parameters as in Section 3.1 
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(40) 



Each parameter xi is subject to the constraint ^ x, ^ 1 and they are collectively constrained by Y,i x i = 1- Following 



the method of Section 3.2 we construct a conditional probability table for each compound formula: 
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Pio(S 5 \P,Q,R) 
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Pro (S 6 \R) 



Pro (S 7 \R) 
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Now let us ask, is it possible that the propositions PAQ, -^(QAR) AP, and R A (~>P — > (RAQ)) hold simultaneously? 
Since these are the compound formulas defined as Si, S2, and S3 in Equation [37] the computed probability table 
Pr (Si ,£2, S3) provides the answer: 
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There are two impossible cases: that Si, S2, and S3 hold simultaneously; and that Si holds but S2 and S3 do not. 

Next let us address the matter of 'maximal consistency', which is a property defined by Paris et al to describe how 
compatible a set of logical formulas are with one another. According to their definition we seek the maximum threshold 
value such that the probability of each proposition in some set Y attains at least that threshold; this maximum value 
is designated 77. This definition of maximal consistency describes a linear optimization problem. In particular, the 



maximal consistency of the set {Si ,S2,S3} of formulas from Equation 37 can be formulated as the linear optimization 
problem in which we ask for the maximum value of a new parameter z subject to the constraints Pr (Si = T) ^ z, 
Pr (S2 = T) z, and Pr (S3 = T) z. This is straightforward to construct using parametric probability analysis. First 



we compute the marginal probability distribution on each compound formula in the set {Si,S2,S3} as in Section 4.2 



Si 
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S 3 


Pr(S 3 ) 


T 


X\ +X2 
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Then using the first element of each table in its respective constraint (along with the usual constraints to enforce the 
laws of probability) we construct the following optimization problem: 



maximize : 
subject to : 



Z 
xi 
xi 

X2 



X4 +X5 +X 6 +X-j +x% = 1 



(46) 



X2 +X3 
X2>Z 
X3 +X4 > Z 
X\ +X3 +X5 > Z 

and : ^ xi ^ 1 

0^X2^1 

0^x 3 ^l 
0^x4^1 
< x 5 sC 1 
^ x 6 ^ 1 
^ x-j ^ 1 
< x 8 s$ 1 
O^zs^ 1 

Solving this linear program yields the maximum value 77 = 0.667 which is achieved at the following point: 

( xi = 0.333, x 2 = 0.333, x 3 = 0.333, x 4 = 0.000, x 5 = 0.000, x 6 = 0.000, x 7 = 0.000, x 8 = 0.000, z = 0.667 ) 

(47) 
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Variable 


Formula 


Threshold £ 


Sa 


PAR 


0.667 


s 5 


PA(QWR) 


1.000 


s 6 


R 


0.667 


s 7 


~nR 


0.333 




RA^R 


0.000 



Table 5 Secondary threshold probabilities £ for the formulas from Equation 38 using the maximal consistency 
rj =0.667 of the formulas {Si,^,^} from Equation [57] as the primary threshold probability. 



This agrees with the result reported in [26|. Note that, although the author's polynomial-optimization solver used 
floating-point arithmetic to compute the optimization results displayed here, there are exact rational solvers for linear 
programs which would instead calculate the solution precisely as the fraction 2/3. 

Moving on, we can formulate additional optimization problems to compute for other logical formulas the 'sec- 
ondary threshold probability' designated £, when the solution value rj above is used as the 'primary threshold proba- 
bility'. By the definition in 11261 we seek the maximum probability £ of each queried formula subject to the constraints 
that every formula in the designated set T must attain probability at least rj. Again this definition describes certain 
linear optimization problems. For this example, to find secondary threshold probabilities of the formulas defined as Sa 



through 58 in Equation 38 we first calculate their symbolic polynomial probabilities: 
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X\ +X2 + X3 + X4+X5 +X6 + X7 +X8 
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Now to compute the secondary threshold probability for the query Sa := (PAR) relative to the set {S\, S2, S3} using the 
primary threshold probability 77, we query the maximum value of Pr (Sa = T) subject to the constraints Pr (S\ = T) ^ 
Z, Pr (S2 = T) ^ z, and Pr (S3 = T) ^ z where the auxiliary parameter z is now fixed at the desired primary threshold 
rj. Using the solution rj = 0.667 from the optimization problem in Equation 46 as the primary threshold probability, 
we construct this optimization problem to compute the secondary threshold probability for the formula 54 :— (PAR): 

maximize : x\ +X3 

subject to : x\ +X2 +X3 +X4 +X5 +xg +X7 +xs = 1 

X\ +X2 ^ z 

X2 + X3 +XA ^ Z 

xi +x 3 +x 5 ^ z 
O^xi^l 

0^X2^1 

< x 3 1 
sC x 4 1 
sC x 5 1 

< X 6 S^l 

< x 7 sC 1 

< X 8 S$ 1 

z = 0.667 

Solving this linear program yields the maximum value £ = 0.667 for the queried formula 54 := (PAR). By similar 



and : 



(50) 



analysis we compute secondary threshold probabilities for the remaining formulas in Equation 38 with the results 
displayed in Table [5] Notably for the query formula R the secondary threshold £ = 1/3 was reported in 1261 : but the 
maximal solution to the optimization problem suggested by the text is twice this value. 
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7.3 Goodman's Hot Buttered Conditionals 



Next let us visit Goodman's treatment of counterfactual conditional statements using his opening example from Fact, 
Fiction, and Forecast 1 1 1 1. In his own words: 

What, then, is the problem about counterfactual conditionals? Let us confine ourselves to those in which 
antecedent and consequent are inalterably false — as, for example, when I say of a piece of butter that was 
eaten yesterday, and that had never been heated, 

If that piece of butter had been heated to 150° R, it would have melted. 

Considered as truth-functional compounds, all counterfactuals are of course true, since their antecedents 
are false. Hence 

If that piece of butter had been heated to 150° R, it would not have melted 

would also hold. Obviously something different is intended, and the problem is to define the circum- 
stances under which a given counterfactual holds while the opposing counterfactual with the contradictory 
consequent fails to hold. 

Let us use H to represent the proposition that the butter had been heated, and M for the proposition that it melted. Each 
variable has the set {T, F} of possible elementary values representing truth and falsity. It is indeed the case that given 
the axiom that H is false, both statements of material implication H — » M and H — > are true. In fact, Boolean 
polynomial representation shows that the conjunction (H — > M) A (H — > -^M) simplifies to the negation -^H. In other 
words, interpreting them as statements of material implication, the logical conjunction of the two opposing conditional 
statements above is equivalent to the single unconditional statement: 

That piece of butter had not been heated to 150° R 

If it desired to meet Goodman's dichotomy criterion (that one of the conditional statements holds but the opposing 
statement with the contradictory consequent does not hold) then it is not wise to model his conditional sentences as 
statements of material implication. 

Parametric probability networks and direct probability encoding provide the desired 'something different' to model 
conditional and counterfactual statements. The resulting models and analysis meet Goodman's dichotomy criterion 
and give otherwise reasonable results. To illustrate, let us build a parametric probability network including the primary 
variables H and M to represent the prepositional variables, along with the primary variables C\ := (H — >• M) and 
C2 '■= (H —> -^M) to represent the corresponding statements of material implication; we add parameters x, y, and z. 
Here is the network graph: 




(51) 

First we specify an uninformative prior distribution on H and M using parametric distributions Pro (H) and Pro (M \ H), 
with constraints ^ x ^ 1, ^ y ^ 1, and ^ z ^ 1: 
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ProW 
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1 -x 



Pro(M|#) 
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M — T 
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i-y 
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z 


l-z 



(52) 
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Let us interpret Goodman's first sentence as the constraint Pro (M = T | H = T) = 1 that the input value of the condi- 
tional probability that M is true given that H is true is one; likewise let us interpret his second sentence as the constraint 
Pro (M = T | H = T) = or the equivalent Pro (M = F | H = T) = 1. Now it is a matter of elementary algebra that these 
statements are inconsistent: the first says y = 1 and the others say y — 0. This algebraic inconsistency does not involve 
the parameter x that encodes the prior probability Pro (H = T) that the butter had been heated. In other words, these 
opposing conditional sentences are inconsistent specifically because their consequents disagree and not because they 
are counterf actual. Additionally, in this interpretation these conditional sentences do not have existential import. Nei- 
ther constraint y — 1 nor y = requires that H must certainly be true (x — 1), nor even that H must possibly be true 
(x > 0); these constraints on y do not affect x at all. 

Next let us compute the conditional probability Pr (M \ H) of whether the butter melted given whether it had been 
heated, alongside the marginal probability Pr (M) of whether the butter melted (integrating the cases that it had or had 
not been heated) and the marginal probability Pr (H) of whether the butter had been heated: 



Pr (M\H) 
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M — T 
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= F 
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(xy)/(x) 


(x 


-xy)/(x) 
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(1 
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1 — z — xy + xz 


F 


l-x 
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These computed probability tables already tell an interesting story about Goodman's conditional sentences. First, 
the output probability Pr (H — T) that the butter had been heated has exactly the same value as the input probability 
Pro (H = T), namely the parameter x. However the output probability Pr (M = T | H = T) differs in a subtle way from 
the input probability Pro (M — T | H — T); the input value is the parameter y but the output value is the quotient xy/x. 
Therefore if it is certain that the butter had not been heated (x — 0) then the computed conditional probability xy/x that 
the butter melted given this now-impossible condition is algebraically indeterminate: it is the quotient 0/0. 

Notwithstanding this exceptional conditional probability, the overall marginal probability that the butter melted 
subject to the constraint x = that the butter certainly had not been heated does not involve division by zero; in fact it 
does not involve division at all. The computed probability Pr (M = T), whose value is the polynomial z+xy — xz, sim- 
plifies to z when x = 0. In other words, subject to the constraint that the butter certainly had not been heated, the prob- 
ability that the butter melted is exactly the value z specified as the input component probability Pro (M = T \H = F). 
If the user has not provided any more information about z then its precise value remains indeterminate; it is only 
constrained by ^ z ^ 1 to satisfy the laws of probability. In the terminology of Section [6TT] we have just consid- 
ered subjunctive and imperative modes of asking the same question: What is the probability that M is true, given the 
condition that H is false? In either formulation the answer is the same: The queried probability is indeterminate. 

We have just analyzed Goodman's counterfactual conditional sentences using parametric probability directly, with- 
out intermediate formulas from the propositional calculus. However it is instructive to embed the formulas H — >• M 
and H — > -^M into our probability network and to compute the probabilities associated with them. Using the defini- 
tions C\ := (H —> M) and Ci '■= {H — s- ^M) and the method of Section 3.2 we construct the following conditional 



probability tables (here labeled with the embedded formulas instead of with the primary-variable names C\ and C2): 



Pro((#- 
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We compute the marginal probabilities that each embedded formula is true: 





Pr((#->M)) 
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Pr ((H — > -Af)) 
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xy 



(55) 



Solving either equation 1 — x+x>' = 1 — xy or x— xy — xy reveals that there are exactly two cases in which the opposing 
statements of material implication have the same probability of truth: when x = (in which case each statement is true 



28 



with certainty) and when y = I (in which case each statement is true with probability 1 — jx). In other words, if it is 
certain a priori that the butter had not been heated (x = 0) then both formulas H — > M and H — » must certainly be 
true. However if there is exactly a 50% chance that the butter melted after it had been heated (y = 5) then it is also 
equally likely that the formulas H — > M and H — » -^M are true; but now their mutual probability (1 — ^x) is one minus 
half the prior probability that the butter had been heated. 

The relation that both formulas H —> M and H — > -^M are equally likely to be true is different from the relation 
that both formulas are simultaneously true; both relations occur when x = but only the former occurs when y — \ 
and i^O. Evaluating the joint probability Pr (H,C\,C2) of H and both statements of material implication provides 
additional detail that is hidden in the marginal probabilities above: 
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This probability table shows that it is impossible that the antecedent H and both opposing statements of material 
implication are simultaneously true (the probability of this event, which appears in row 1 of the table, is identically 
zero). Moreover when H is true exactly one of the opposing statements of material implication must hold (see rows 1 
through 4). But when H is false both statements of material implication must be true (see rows 5 through 8). To recover 



the probabilities in Equation 55 from the probabilities in Equation 56 it is necessary to add appropriate elements of 
the latter probability table. For example Pr ((H —> M) — T) is given by the sum (0) + (xy) + (1 — x) + (0) of the 
polynomials from rows 1, 2, 5, and 6, which yields 1 —x + xy. 



7.4 Smullyan's Knights, Knaves, and Zombies 

Finally we consider two of Smullyan's problems from What Is the Name of This Book? which were also analyzed by 
Kolany using a rather different technique 11301 [171 . First a basic knights and knaves problem for which we must simply 
answer a probability-table query; and second a zombie problem in which we must first answer a probability-table 
query and then perform a search to find certain values of the parameters in the resulting probability table. 



WE ARE THE KNIGHTS WHO SAY. . . The background for the first problem (number 36 in [30]) is that on the 
imagined island, knights always tell the truth and knaves always lie. 

Once when I visited the island of knights and knaves, I came across two of the inhabitants resting under 
a tree. I asked one of them, "Is either of you a knight?" He responded, and I knew the answer to my 
question. 

What is the person to whom I addressed the question — is he a knight or knave; And what is the other one? 

In the parametric probability model let us use the variable A to represent the proposition that the respondent is a knight, 
and B for the proposition that the other inhabitant is a knight. We define Q := (AVfi) to represent the true answer 
to the question of whether either inhabitant is a knight. We introduce R to represent the response that is given: R is 
true if the response is 'yes' and false if the response is 'no'. By Smullyan's rules for the island, if Q were true then 
the inhabitant would respond 'yes' if and only if he were a knight; and if Q were false then the responses would be 
opposite. Hence the definition R :— (A 4-> Q) using the biconditional. Here is the network graph, which also includes 
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the parameters x\ through X4: 




(57) 

We specify an uninformative parametric probability distribution on A and B using the x, parameters with the usual 



constraints ^ x, ^ 1 and x, = 1 as in Section 3. 1 , we construct the appropriate component probability tables for Q 



and R according to Section 3.2 
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We compute the joint probability of the identities A and B conditioned on the response R: 



Pr 


A,B\R) 
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A = J,B = T 


A = T,B = F 


A = F,B = T 


A = F,B=F 
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This result table for Pr (A,B\R) gives the solution. An answer of 'yes' (R — T) leaves three possible configurations 
of identities, but an answer of 'no' (R = F) leaves only one possible configuration of identities: that the responder is 
a knave and the other inhabitant is a knight (A = F,B — T). Therefore if the identities of the inhabitants are known 
with certainty after the response, then the response must have been 'no', the responder must be a knave, and his fellow 
inhabitant must be a knight. 



ZOMBIELAND For the second problem (number 160 in [30|) we visit Smullyan's island of zombies. The custom 
here is that zombies always lie and humans always tell the truth. However instead of 'yes' and 'no' the inhabitants 
answer questions with 'BaF or 'Da'; one means 'yes' and the other means 'no' but we do not know which is which. 
The problem asks the following: 

Suppose you are not interested in what "Bal" means, but only in whether the speaker is a zombie. How 
can you find this out in only one question? (Again, he will answer "Bal" or "Da.") 

For the parametric probability model of this problem we use the variable H to represent whether the speaker is human 
(H = T) or zombie; B to represent whether 'Bal' means 'yes' (B = T) or 'no'; Q for the true answer to the unknown 
question that is sought; and R for whether the speaker gives the response 'BaF {R = T) or 'Da'. Here is the network 
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graph: 




(60) 

The prior distribution Pro (H,B) on H and B is specified using parameters x\ through x\ with x, sC 1 and £,-jc, = 1- 
Additional parameters t\ through t\ with each t, E {0,1} are introduced as described in Section 3.3 to specify the 
component probability table Pro (Q\H,B) of the unknown question Q. Here are both component probability tables: 

Pro(Q\H,B) 
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(61) 



The speaker's response R can be modeled in the propositional calculus using nested biconditionals to represent the 
rules of the island. The formula H «-» Q reveals whether the the speaker will answer in the affirmative. For example a 
zombie (H = F) will dishonestly provide an affirmative answer to a question that is actually false (Q — F) but a human 
will honestly provide a negative answer to a question that is actually false. Relating this inner biconditional to whether 
'Bal' means 'yes' using another biconditional (H «-» Q) B then reveals whether the speaker will answer 'Bal' (if 
this outer biconditional is true) or 'Da' . For example, if the speaker will answer in the affirmative, then he will respond 
with 'Bal' if and only if that means 'yes'. The following component probability table Pro (R\H,Q,B) constructed as 
in Section [3~2] implements the definition R := ((H <H- Q) <R- B): 

Pro (R\H,Q,B) 
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It takes two phases of analysis to find a question that will determine whether the speaker is human or zombie. For 
the primary analysis we compute the joint probability Pr (R,H) of each identity and each response using symbolic 
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Table 6 Secondary analysis to distinguish humans from zombies, using the polynomials in the result table for Pr (R,H) 



shown in Equation 63 instantiated at different values of the parameters t\ through 14. We search for values of (fi,?2i f 3i f 4) 
such that exactly one of Pr (if = T,H = T) or Pr (if = T,H = F) is zero, and exactly one of Pr (if = F,H = T) or 
Pr (if = F,ii = F) is zero. The values at rows 6 and 11 meet these criteria. 
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probability-network inference as in Section 



4.2 



Index 


R H 


Pr (R,H) 


1 


T T 


X2+t\X\ —t%Xl 


2 


T F 


X3 — t^Xj + /4X4 


3 


F T 


X] -t\X]_+t 2 X2 


4 


F F 


X4 + £3X3 — t4X4 



(63) 



For the secondary analysis we must find values of the parameters (ti,t2,t3,k) such that the question Q encoded by 
those values successfully discriminates between humans and zombies. With regard to the result table for Pr(R,H) 
in Equation [63] successful discrimination requires that after substitution of the chosen values of the f, parameters, 
exactly one of the first two elements of Pr (R,H) is identically zero and that exactly one of the second two elements is 



identically zero. As in Section 5.3 we can set up a simple exhaustive search to find such values by substituting each 



of the sixteen possible values of (?i,f2,f3,f4) with each f, € {0, 1} into each of the four polynomials in Equation 63 
Table [6] shows these substituted polynomial values. There are two vectors of parameter values that meet the search 
criteria: (0, 1,0, 1) at row 6 and (1,0, 1,0) at row 11. With reference to Equation 61 here is the question Q instantiated 
using the second solution £2,^3 7^4) = (1,0, 1,0): 



Pr (Q\H,B) 



H B 


Q=T Q=F 


T T 


1 


T F 


1 


F T 


1 


F F 


1 



(64) 



Using this solution the primary variable Q expresses the logical formula B representing whether 'BaT means 'yes'; in 
other words Q := B. In this case the joint probability of identity and response, obtained by substituting the selected 
parameter values into Equation [63] is given by: 



Index 


R 


H 


Pr (/?,#) 


1 


T 


T 


x\ +x 2 


2 


T 


F 





3 


F 


T 





4 


F 


F 


X3 +X4 



(65) 



In other words, the question "Does 'Bal' mean 'yes'?" will reliably distinguish a human speaker from a zombie: a 
human (H = T) must answer 'Bal' {R = T) but a zombie must answer 'Da'. The other solution (0, 1,0,1) for the 
parameters (t\ , t 2 , h , t^) gives the negation of this question which works just as well; in response to the question "Does 
'Da' mean 'yes'?" (corresponding to the formula ->B) a human must answer 'Da' but a zombie must answer 'Bal'. 
Notably, Kolany incorrectly matched the question from the first solution with the responses from the second solution: 
". . . we could ask him whether Bal means Yes. If he answers Bal, he is a zombie." ifTTl 

Note that certain prior assumptions about whether the speaker is human versus zombie will lead to exceptions. 
Here are the computed probability distributions Pr (H) and Pr (/?): 



H 


Pr(ff) 


R 


Pr(fl) 


T 


x\ +X 2 


T 


X\ +X2 


F 


X3 +X4 


F 


XT, +X4 



(66) 



Had it been specified a priori that there were no humans on the island (with the constraint x\ +x 2 = 0) then it would 
be impossible under the rules of the island for the speaker to answer 'Bal'; in other words Pr (R = T), which has the 
polynomial value x\ +X2, would also be constrained to equal zero. Likewise had it been specified that there were no 
zombies on the island (X3 +X4 = 0) then it would be impossible for the speaker to answer 'Da' {R = F). 
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8 CONCLUSION 



The method of parametric probability analysis, introduced and illustrated above, demonstrates that probability and 
classical logic are not only compatible but also complementary. To adopt a popular turn of phrase, there is no daylight 
between logic and probability. Many so-called 'logic' problems are more specifically probability problems, because 
they require reasoning about the probabilities of formulas from the propositional calculus. For such problems it 
is useful to embed logical formulas inside parametric probability networks. Many other logic problems are better 
represented directly as parametric probability networks, without use of the propositional calculus or of first-order 
logic at all. Parametric probability analysis complements classical logic by providing a powerful set of computational 
tools for modeling and reasoning about implication, consequence, and quantification. 
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A PROBABILITY MODEL SOURCE CODE 



A.l Ace-King 

// ace-king. pql: Johnson-Laird Acta Psych 1996 via Bringsford J Applied Logic 2008 

primary A { label = "There is an ace in the hand"; states = binary; J- 
primary K { label = "There is a king in the hand"; states = binary; } 
clique _C; potential ( _C : A K ) { parametric (x) ; } 

primary P { 

label = "Value of $((K \lef trightarrow A) \lef trightarrow K)$"; 
states = binary; 

} 

probability ( P I A K ) { function = ( "P <-> ( (K <-> A) <-> K) ? 1 : 0" ) ; > 
A. 2 Amphibian 

// amphibian. pql : adopted from Paris, Muino, Rosefield 2009 

primary P { label = "Chicken killer"; states = binary; } 

primary Q { label = "Japanese"; states = binary; } 

primary R { label = "Salamander"; states = binary; > 

// fully parametric prior distribution, no independence assumptions 

clique _C; probability ( _C : P Q R ) { parametric (x) ; } 

// beliefs (like axioms) 



primary S_l 


{ 


label 




"Value of $(P \wedge Q)$"; states = binary; > 


probability 


( 


S_l 1 


P 


Q ) { function = ( "S_l <-> P && Q ? 1 : 0" ) ; > 


primary S_2 


{ 


label 




"Value of $(\neg (Q \wedge R) \wedge P)$"; states = binary; > 


probability 


( 


S_2 I 


P 


Q R ) { function = ( "S_2 <-> ! (Q && R) kk P ? 1 : 0" ) ; } 


primary S_3 


{ 


label 




"$S_3 :: R \wedge (\neg P \rightarrow (R \wedge Q))$"; states = 


probability 


( 


S_3 I 


P 


Q R ) { function = ( "S_3 <-> R kk (!P -> (R kk Q)) ? 1 : 0" ); 


// queries 










primary S_4 


{ 


label 




"Value of $(S_4 :: P \wedge R)$"; states = binary; } 


probability 


( 


S_4 I 


P 


R ) { function = ( "S_4 <-> P kk R ? 1 : 0" ) ; } 


primary S_5 


{ 


label 




"Value of $(P \wedge (Q \vee R))$"; states = binary; } 


probability 


( 


S_5 I 


P 


Q R ) { function = ( "S_5 <-> P kk (Q I I R) ? 1 : 0" ) ; } 


primary S_6 


{ 


label 




"Value $(R)$"; states = binary; } 


probability 


( 


S_6 I 


R 


) { function = ( "S_6 <-> R ? 1 : 0" ) ; } 


primary S_7 


{ 


label 




"Value of $(\neg R)$"; states = binary; } 


probability 


( 


S_7 I 


R 


) { function = ( "S_7 <-> !R ? 1 : 0" ); > 


primary S_8 


{ 


label 




"Value of $(R \wedge \neg R)$"; states = binary; > 


probability 


( 


S_8 I 


R 


) { function = ( "S_8 <-> R kk !R ? 1 : 0" ) ; } 


net { graph 




"rankdir = TB;"; } 


net { graph 




'subgraph { rank=same; "P" ; "Q" ; "R"}' ; } 
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A. 3 Counterfactual Conditions 

// butter. pql: Goodman's counterfactual from Fact, Fiction, and Forecast 



parameter x 


{ range 


= (0,1); } 


parameter y 


{ range 


= (0,1); } 


parameter z 


{ range 


= (0,1); } 


primary H { 


label = 


"The butter was heated"; states = binary; } 


probability 


( H ) { 


data = (x, 1-x) ; > 


primary M { 


label = 


"The butter melted"; states = binary; }■ 


probability 


( M I H 


) { data = (y, 1-y, z, 1-z) ; } 


primary C_l 


{ 





label = "Value of $(H \rightarrow M)$"; tex = " (H \rightarrow M) " ; 
states = binary; 

} 

probability ( C_l I H M ) { function = "C_l <-> H -> M ? 1 : 0"; } 
primary C_2 { 

label = "Value of $(H \rightarrow \neg M)$"; tex = " (H \rightarrow \neg M) " ; 
states = binary; 

} 

probability ( C_2 I H M ) { function = "C_2 <-> H -> !M ? 1 : 0" ; > 
net { graph = 'subgraph { rank=same; "H" ; "M"; }'; } 

A. 4 Knight or Knave 

// knight2.pql: Kolany's example 2, from Smullyan 1978 #36 p. 23 
// I asked one of them, "Is either of you a knight?" 

primary A { label = "A is a knight"; states = binary; } 
primary B { label = "B is a knight"; states = binary; } 
clique _C; probability ( _C : A B ) { parametric (x) ; > 

primary Q { label = "Question: $A \vee B$"; states = binary; } 
probability ( Q I A B ) { function = ( " (Q <-> A I I B) ? 1 : 0" ) ; } 

primary R { label = "A's response: $A \lef trightarrow Q$"; states = binary; } 
probability ( R | Q A ) { function = ( "R <-> (Q <-> A) ? 1 : 0" ) ; } 

net { graph = 'subgraph { rank=same; "Q"; "R"; }'; > 

A. 5 Human or Zombie: Primary Analysis 

// zombiel.pql: Kolany's example 3 from Smullyan 1978 #160 p. 150 

primary H { label = "Speaker is human"; states = binary; } 

primary B { label = "'Bal' means 'yes'"; states = binary; } 

primary Q { label = "Question (parametric in $t_i$)"; states = binary; } 

probability ( Q I H B ) { parametric (t) ; } 

clique _C; probability ( _C : H B ) { parametric (x) ; } 

primary R { 
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label = "Response is 'Bal': $((H \lef trightarrow Q) \lef trightarrow B)$"; 
states = binary; 

} 

probability ( R I H Q B ) { function = ( "R <-> ((H <-> Q) <-> B) ? 1 : 0" ) ; } 

net { graph = 'subgraph { rank=same; "Q"; "R"; }'; )• 

net { graph = 'subgraph { rank=max; "tl"; "t2"; "t3"; "t4"; }'; } 

Human or Zombie: Secondary Analysis 

// zombiel-search.pql: Kolany's example 3 from Smullyan 1978 #160 p. 150 

decision t [1] { states = values (0,1); } 

decision t [2] { states = values (0,1); )• 

decision t [3] { states = values (0,1); } 

decision t [4] { states = values (0,1); } 

// sequence of decisions; not important here 

probability ( t [1] ) -Q 

probability ( t [2] I t[l] ) {} 

probability ( t [3] I t [2] ) {} 

probability ( t [4] I t [3] ) {} 

// to allow substitution of t [i] values; not really decision-theoretic utilities 

utility U_l { tex = "\prob{R=\state{T} ,H=\state{T}}-" ; range = (0,1); } 

utility U_2 { tex = "\prob{R=\state{T> ,H=\state{F}}-" ; range = (0,1); } 

utility U_3 { tex = "\prob{R=\state{F> ,H=\state{T}>" ; range = (0,1); } 

utility U_4 { tex = "\prob{R=\state-[F}- ,H=\state{F}>" ; range = (0,1); } 

// polynomials are from the result Pr( R, H ) using zombiel.pql 

probability ( U_l I t [1] t [2] t [3] t [4] ) { function = "x2 + tl*xl - t2*x2"; > 

probability ( U_2 I t[l] t [2] t [3] t [4] ) { function = "x3 - t3*x3 + t4*x4" ; > 

probability ( U_3 I t [1] t [2] t [3] t [4] ) { function = "xl - tl*xl + t2*x2"; > 

probability ( U_4 I t [1] t [2] t [3] t [4] ) { function = "x4 + t3*x3 - t4*x4" ; > 

primary H { label = "Speaker is human"; states = binary; } 
primary B { label = "'Bal' means 'yes'"; states = binary; > 

// Question 11: Does 'Bal' mean 'yes'? 

set t[l] = 1; set t [2] =0; set t [3] = 1; set t [4] =0; 

primary Q { label = "Question (parametric in $t_i$)"; states = binary; } 

probability ( Q I H B ) { parametric (t) ; } 

clique _C; probability ( _C : H B ) { parametric (x) ; > 

primary R { 

label = "Response is 'Bal': value of $(Q \lef trightarrow (H \lef trightarrow B)) 
states = binary; 

} 

probability ( R I Q H B ) { function = ( "R <-> (Q <-> (H <-> B) ) ? 1 : 0" ) ; } 

net { graph = 'subgraph { rank=same; "Q"; "R"; }'; } 

net { graph = 'subgraph { rank=max; "tl"; "t2"; "t3"; "t4"; }'; > 
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